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ABSTRACT 

This  report  describes  the  kernel  density  estimation  technique  and  its  application  to  range  safety 
applications.  The  kernel  density  estimation  technique  is  shown  to  be  suitable  for  developing 
probabilistic  risk  assessments  from  ground  impact  data  generated  for  guided  weapon  systems  via 
Monte  Carlo  simulations.  An  advantage  of  this  technique  is  that  it  can  be  used  to  predict  the 
probability  density  function  for  minimal  simulated  ground  impacts  with  apparently  random 
d  i  stri  buti  on.  Several  techni  ques  have  been  proposed  to  amel  i  orate  the  i  dentifi  ed  I  i  mitati  ons  of  the 
kernel  density  estimation  technique,  including  a  covariant  form  for  two-dimensional  data. 
Analysis  of  the  available  simulated  guided  weapon  ground  impact  data  has  identified  that 
arou  nd  si  X  hu  nd  red  i  mpact  poi  nts  are  suff i  ci  ent  for  generati  ng  a  probabi  I  i ty  d  i  stri  buti  on . 
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Executive  Summary 

The  Range  Safety  T  empi  ateT  ool  kit  (RSTT)  devel  opment  project  undertaken  by  Weapons 
Systems  Division  of  the  Defence,  Science  and  Technoiogy  Organisation  (DSTO)  was 
scoped  to  deveiop  probabiii Stic  risk  hazard  anaiysiscapabiii ties  for  guided  weapon  and 
sounding  rocket  triais.  The  Centre  for  Defence  Communications  and  Information 
Networking  (CDCIN),  formerly  known  asTRC  Mathematical  Modelling  (TRC),  at  the 
University  of  A  del  aide  was  contracted  to  undertake  research  and  development  in  support 
of  the  RSTT  project.  To  meet  the  objectives  of  the  RSTT  project,  DSTO  proposed  using 
M  onte  Carl o  si  mul ati ons  of  specif i c  vehi cl es  (i  ncl  udi  ng  I  i  kely  fai  I  ure  response  modes)  to 
generate  ground  impact  data  that  could  be  turned  into  a  probability  density  function.  Itis 
this  final  aspect  that  was  the  focus  of  the  research  and  development  discussed  in  this 
report. 

To  support  the  research  and  development  work,  DSTO  provided  results  (including 
ground  i  mpact  data)  from  M  onte  Carl  o  si  mul  ati  ons  of  a  generi  c  guided  weapon  system. 
Consultation  between  DSTO  and  CDCIN  identified  a  number  of  essential  activitiesforthis 
work: 

1.  Analyse  distributions  to  understand  heterogeneous  processes. 

2.  Devel  op  robust  estimation  methods:  A  pplyEVT  (ExtremeValueTheory)  and  other 
methods  to  representative  distributions  for  evaluation  of  statistical  method 
effectiveness. 

3.  Develop  an  understanding  of  "typical"  impact  distribution  data  supplied  by 
Weapons  Systems  Division  of  DSTO.  The  data  covered  a  range  of  missile  launch 
scenarios  and  failure  modes  considered  by  DSTO  to  be  typical  of  the  data  that 
mi  ght  be  generated  for  actual  weapons  systems. 

4.  Investigate  techniques  suitable  for  generating  approximate  probability  density 
f u  ncti  ons  represent!  ng  mi  ssi  I  e  i  mpact  d  ata. 

5.  I  nvesti  gate  the  convergence  properti  es  of  those  techni  ques  with  i  ncreasi  ng  size  of 
dataset. 

6.  Investigate  the  degree  of  resolution  required  to  provide  impact  distributions 
meaningful  to  use  in  a  Range  Safety  Tempi  ate  Tool  kit. 

7.  Investigate  techniques  for  generating  range  safety  templates  for  scenarios  for 
which  simulated  data  are  not  directly  available,  more  specifically,  to  investigate 
thefeasi  bi  I  ity  of  usi  ng  i  nterpol  ati  on  techni  ques  for  approxi  mati  ng  a  given  scenario 
from  other  scenarios  for  which  data  exists. 

8.  Examine  the  impact  of  alternative  distance  metrics  on  the  quality  of  the  impact 
zone  i  nterpol  ati  on  process. 


The  research  and  de/elopment  activities  undertaken  by  by  CDCiN  and  DSTO  have: 

1.  Quai  itati veiy  descri  bed  the  features  of  i mpact  distri buti on  data  that  may  affect 
subsequentstatisticai  modeiiing. 

2.  Defined  a  technique,  specificaiiy,  the  use  of  kernei  density  estimation,  for 
providing  a  statisticai  modei  of  a  specific  missiie  impact  data  set  which  estimates 
the  probabi  i  i  ty  densi  ty  f  uncti  on  of  the  i  mpact  d  i  stri  buti  on .  The  soi  uti  on  proposed 
here  i  s  purei  y  data  anai  yti  c  and  as  such  does  not  ai  i  ow  for  the  i  ncorporati  on  of  any 
su  bstanti  ve  kn  ow  i  ed  ge. 

3.  Defined  a  technique  for  combining  kernei  density  estimates  corresponding  to 
differentfaiiu  re  modes  with  in  asingieoperationai  scenario. 

4.  Defined  atechniquefor  incorporating  information  on  missiieMaximum  Energy 
Boundaries  into  theanaiysisso  as  to  refi ne  the  i mpact  zone  probabi iity  density 
function. 

5.  Defined  a  technique  for  using  the  probabiiity  density  function  together  with 
popuiation  density  information  to  obtain  estimated  injury  rates  for  a  given 
scenario. 

6.  Defi  ned  a  techni  quefor  usi  ng  the  probabi  i  ity  density  functi  on  together  with  range 
boundary  information  to  obtain  an  estimate  for  a  missiie  ieaving  a  given  range. 

7.  Defined  a  technique  for  using  the  probabiiity  density  function  to  determine  a 
conservative,  convex  safety  exciusion  zonewith  given  probabiiity  of  the  missiie 
ieaving  the  zone. 

8.  Defined  an  approximate  technique  for  defining  a  conservative  exciusion  zone 
derived  from  probabiiity  density  functions  of  different  scenarios. 

9.  Found  that  KDE  resoiutions  beyond  16  x  16  and  32  x  32  do  not  provide 
significantiy  more  accurate  information  and  hence  16  x  16  or  32  x  32  resoiutions 
appear  to  besuitabiefor  the  deveiopment  of  Range  Safety  Tempi  ates. 

10.  Found  that  at  ieast  600  observations  (impact  data  points)  shouid  be  used  in 
generating  KDEsfor  a  given  scenario. 

11.  i  dentifi  ed  situati  ons  i  n  w  hi  ch  the  Kernei  Density  Esti  mati  on  process  is  not  robust, 
generai  iy  when  tight  ci  usters  of  data  poi  nts  occur  withi  n  thedata  set.  i  n  such  cases 
the  bandwidth  parameters  automaticaiiy  generated  by  the  process  tend  to  bevery 
smaii  and  the  KDE  generated  consequentiy  "erratic".  This  report  has  suggested 
one  method  of  dynami  c  band  w  i  dth  cai  cu  i  ati  on  to  i  mprovethe  PDF  for  ci  ustered  or 
non-normai  ground  impact  distributions. 

12.  Described  acovariantformtheKernei  Density  Estimator  for  two-dimensionai  data 
that  robustiy  predicts  the  ground  impact  probabiiity  function. 

13.  Outiined  a  numericai  approach  to  ensurecomputationaiiy  accurate  and  efficient 
resuits  are  obtained  when  using  the  kernei  density  estimate  technique  with  reai 
impact  data. 

The  resuits  obtained  from  the  work  outiined  in  this  document  are  essentiai  for  the 
operation  of  the  Range  Safety  Tempi  ateTooi  kit.  RSTT  is  a  capabi  iity  for  the  generation  of 
probabiiistic  risk  hazard  anaiyses  and  weapon  danger  areas  for  guided  weapon  and 
sounding  rocket  triais.  Duetotheiargefiight  ranges  of  these  systems  and  iimited  range 
space  for  triais,  RSTT  and  its  supporting  research  are  important  for  ensuring  that  future 
system  triais  can  be  practicaiiy  conducted  in  Austraiia.  importantiy,  the  probabiiistic 
methodoiogies  presented  here  can  potentiaiiybeappiiedtoabroad  range  of  appii  cations 
that  requi rerisk  hazard  anaiysis  i nd  udi ng:  bai i i Stic  munition  testi ng,  ai  rcraftfi ight,  orbitai 
re-entry,  rocket  iaunches and  expiosive testing. 
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1.  Introduction 

1.1  Background  and  Purpose 

In  late  2004  the  Defence  Science  and  Technology  Organisation  (DSTO)  of  the  Australian 
Department  of  Defence  initiated  development  of  an  advanced,  probabilistic  range  safety 
assessment  system  for  guided  air  missiles.  The  resulting  system  is  called  the  Range  Safety 
T empi  ate  T ool  kit  (RSTT) .  The  key  output  of  RSTT  i  s  a  range  safety  tempi  ate,  w  hi  ch  def i  nes 
the  evacuation  area  for  a  planned  trial,  also  known  as  a  weapon  danger  area  (WDA)  or 
weapon  safety  footprint  area.  WDA  isthe  standard  NATO  term  for  such  an  area. 

In  designing  RSTT,  DSTO  proposed  a  template  generation  methodology  based  on  high 
f  i  del  i  ty  M  onte  Carl  0  si  mu  I  ati  on  of  the  mi  ssi  I  e,  prod  u  ci  ng  I  arge  sets  of  grou  nd  i  mpacts  for  both 
nominal  and  off-nomi  nal  (i.e.  failed)  missile  fly  outs.  One  step  in  the  proposed  methodology 
required  RSTT to  generate  two-dimensional  ground  impact  probability  density  functions  from 
the  I  arge  sets  of  ground  impact  coordinates. 

Earl  y  experi  mentati  on  i  nto  probabi  I  i  sti  c  method  ol  ogi  es  and  M  onte  C  arl  o  si  mu  I  ati  on  show  ed 
non-Gaussian  ground  impact  scatter  was  typical  for  guided  weapon  systems.  The  observed 
scatter,  and  limited  time  and  computing  resources  became  the  primary  constraints  for  the 
DSTO  approach.  The  RSTT  development  plan  therefore  cal  led  for  research  and  development 
(R&D)tobeconducted  into  appropriate  statist!  cal  techniques  for  thecal  culation  of  probability 
distributions,  from  minimal  data  sets.  As  a  large  project,  not  all  R&D  aspects  of  RSTT 
d  evel  op  ment  could  be  handled!  nter  n  allybyDSTO,sothisparticular  task  w  as  contracted  to  a 
research  centre  of  the  University  of  Adelaide,  the  Centre  for  Defence  Communications  and 
I  nformati  on  N  etworki  ng  (CDCI N ),  formerl  y  known  asTRC  M  athemati  cal  M  odel  1 1  ng  (TRC). 
TheCDCIN  team  drew  on  statistics  expertisefrom  the  University's  Applied  Mathematics 
department  and  regularly  consulted  with  theDSTO  RSTT  develop  ment  team  on  thedirection 
of  their  research. 

Three  cl  I  ent  reports  were  produced  by  CDCI  N  over  a  three  year  peri  od  from]  anuary  2005  to 
February  2008,  proposi  ng  an  i  ni ti  al  sol  uti  on  to  the  probi  em  and  then  exami  ni  ng  vari  ous  I ssues 
in  the  application  of  that  solution.  This  DSTO  Report  presents  for  public  release  the 
consolidated  analysis  and  findings  of  this  R&D  into  the  problem  of  calculating  probability 
distri  butions  from  mi  ni  mal  data  sets  for  range  safety  purposes. 
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1.2  Scope 

Representative  unclassified  sets  of  simulated  ground  impact  data  were  generated  by  DSTO 

and  provided  to  the  CDCIN  team  as  the  basis  for  their  R&D  activities.  The  activities 

undertaken  by  theteam  included  the  following: 

Phase  1: 

1.  Analyse  distributions  to  understand  heterogeneous  processes,  developing  an 
understanding  of  "typical"  impact  distribution  data  generated  by  DSTO. 

2.  Develop  robust  estimation  methods:  Investigated  EVT  (ExtremeValueTheory)  and 
other  methodswith  representative  distributions  for  evaluation  of  statistical  method 
effectiveness. 

3.  Investigated  database  reduction  and  performance  improvement  methods:  To  make  the 
RSTT  as  computationally  efficient  as  possible,  investigated  database  reduction  and 
performance  methods  for  later  implementation  in  software. 

Phase  2: 

1.  Investigated  the  convergence  properties  of  probability  density  function  estimation 
tech  n  i  q  u  es  w  i  th  i  ncreasi  n  g  si  ze  of  d  ataset. 

2.  Investigated  the  degree  of  resolution  required  to  provide  impact  distributions 
meaningful  to  use  in  RSTT. 

Phase  3: 

1.  Identified  issues  associated  with  "clustering"  of  points  and  their  impact  on  Kernel 
Density  Estimation  and  bandwidth  selection  and  provided  a  process  for  addressing 
these  issues. 

1.3  Report  structure 

This  report  is  structured  as  follows: 

•  A  table  of  acronyms  precedes  the  I  ntroduction 

•  Section  1,  this  section,  introduces  the  report. 

•  Sections  2,  3and  4  provide  the  outcome  of  the  CDCIN  activities  undertaken  intheir 
three  phases  of  work  respectively,  as  described  above  in  the  Scope  section  above. 

•  Secti  onSoutlinessu  bsequ  ent  anal  ysi  s  performed  at  DSTO  i  nto  the  appi  i  cati  on  of  non- 
di agonal  bandwidth  matrices. 

•  Section  6  provides  a  conclusion  to  the  report  and  recommendations  for  further 
investigation. 

•  Appendix  A  lists  the  data  provided  to  CDCIN  for  their  first  phase  of  work. 

•  Appendix  B  provides  a  series  of  plots  to  supplement  Section  2. 

The  references  appear  fol  I owi  ng  the  conci  usi  on. 
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2.  Data  Analysis  and  Development  of  Robust  Estimation 

Procedures 

In  thissection  wereport  against  the  foil  owing,  Phase  1,  activities: 

1.  Analysed  distribution  to  understand  heterogeneous  processes,  developing  an 
understanding  of  "typical"  impact  distribution  data  generated  by  DSTO. 

2.  Develop  robust  estimation  methods:  Investigated  EVT  (ExtremeValueTheory)  and 
other  methods  with  representative  distributions  for  evaluation  of  statistical  method 
effectiveness. 

The  data  generated  by  DSTO  covered  a  rangeofmissilelaunch  scenarios  and  failure  modes 
considered  to  be  typical  of  the  data  that  might  be  generated  for  actual  weapons  systems. 

The  completion  of  these  activities  involved  the  foil  owing  steps: 

1.  Investigation  of  techniques  suitable  for  generating  approximate  probability  density 
functi  ons  model  I  i  ng  missi  I  e  i  mpact  data. 

2.  For  a  given  dataset,  investigation  of  techniques  for  generating  "boundaries"  on  range 
sites  that  result  in  a  probability  of  injury  less  than  pre-determined  safety  levels. 

2.1  Data  Analysis 

2.1.1  The  Data 

Weapons  Systems  Di  vi  si  on  of  DSTO  generated  the  data  f  i  I  es  I  i  sted  i  n  A  ppendi  x  A ,  coveri  ng  a 
range  of  different  operational  scenarios. 

We  d  efi  ne  a  scenano  to  be  a  set  of  simulation  results  derived  usingthesame  input  data  except 
for  thetypeof  failure,  if  any,  that  takes  place  and  theseeds  of  noise  sources.  For  example,  two 
sets  of  simulation  results  corresponding  to  "no  failures  occurring"  and  "locking  of  fins  at 
some  time  during  the  missile  flight"  are  from  the  same  scenario  if  the  launcher  altitude  and 
velocity  and  target  altitude  and  velocity  are  unchanged  between  the  two  data  sets.  If  the 
I  au  n  ch  er  al  ti  tu  d  es  w  ere  d  i  ff  erent,  th  e  resu  Itswouldbesaidtoderivefromdi  ff  erent  seen  ari  os. 

For  agiven  scenarioand  failure  mode,  the  variables  used  from  the  data  generated  by  DSTO 
contai  n  i  nformati  on  descri  bed  i  n  TabI  e  1. 
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Table  1:  Description  of  variables  used  in  analysis 


Variable 

Description 

Su  ccessf  u  1 1  nterceptT  i  me 

The  expected  ti  me  taken  for  a  su  ccessf  u  1  i  ntercept  of  the 
target.  N  otethat  this  column  was  added  to  each  dataset  by 
TRC  based  on  the  times  given  in  the  written 
documentation  provided  by  DSTO.  The  times  provided 
were  to  two  significant  figures. 

Input:  FailureTime 

The(random)  time  at  which  the  fai  lure  occurred  during 
the  fl  i  ght  of  the  mi  ssi  1  e. 

Output:  ImpactPointX 

The  x-coord  i  nate  of  the  i  mpact  poi  nt  of  the  mi  ssi  1  e. 

Output:  ImpactPointY 

The  y-coord  i  nate  of  the  i  mpact  poi  nt  of  the  mi  ssi  1  e. 

In  order  to  illustrate  the  proposed  techniques  of  analysis  of  the  missile  impact  data,  these 
techniques  will  be  demonstrated  on  thedata  provided  for  a  particular  scenario.  Thedetai  Is  of 
this  scenario,  as  provided  by  DSTO,  are  given  in  Table  2.  Uni  ess  otherwise  stated,  all  plots 
provided  in  this  section  correspond  to  this  scenario.  Other  scenarios  were  analysed. 

Table  2:  Details  of  the  scenario  to  be  used  to  illustrate  the  proposed  techniques  of  analysis 


Parameter 

Value 

Launcher  and  target  altitude 

1500  metres 

Launcher  and  target  speed 

400  metres/  second 

Launcher  flight  direction 

North 

T arget  f  1  i  ght  d  i  recti  on 

South-West 

Target  location 

8km  N  orth,  2km  East  of  launcher  location 

Target  manoeuvre 

None 

Fi  gure  1  shows  a  scatter  pi  ot  of  20,000  i  mpact  poi  ntsfor  this  scenari  o,  w here  no  fai  I  u re  occu rs 
during  the  flight  of  the  missile.  These  data  points  were  provided  in  the  file 
'  nofau  I  t_ggm_000_l-20000.csv' . 


Impact  Point  Distribution 


5000  10000  15000  20000 

OttUfK*  from  mifsiio  ituncn  (m) 


Figure  1:  Impact  distribution  for  scenario  in  Table  2  where  no  failure  occurs  (failure  mode  0) 
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Figure  1  shows  that  there  is  a  high  degree  of  heterogeneity  with  i  n  thedataset,  i  ncl  udi  ng  two 
disti  net  areas  of  high  i  mpact  density,  and  a  considerable  amount  of  dispersion  of  the  impact 
points.  Discussions  with  Duncan  Fletcher  and  RobertGraham  of  Weapons  SystemsDivisi  on 
indicatethat  most  of  this  dispersion  isduetothemissilesmissingthetarget,  and  then  turning 
around  to  re-attack  thetarget.  It  can  also  be  seen  that  the  shapes  of  the  areas  of  high  density 
aresomewhat  irregular.  A  consequenceofthehighdegreeof  heterogeneity  and  irregularity  is 
that  thi  s  d  ata  may  not  be  w  el  I  mod  el  I  ed  by  a  mi  xtu  re  of  bi  vari  ate  normal  d  i  stri  buti  ons.  1 1  i  s  of 
particular  interest  that  for  this  scenario,  the  area  of  highest  density  occurs  very  near  the 
bou  nd  ary  of  the  convex  h  u  1 1  of  the  i  mpact  poi  nts.  This  may  be  i  mportant  i  n  the  anal  ysi  s  of  th  i  s 
data  since  it  may  cause  the  chosen  method  of  density  estimation  to  assign  considerable 
probabi  I  ity  density  to  regions  where  no  i  mpact  poi  nts  have  been  observed.  This  is  discussed 
further  in  a  later  section  of  this  report. 

Figure  2  shows  a  scatter  plot  of  20,000  impact  points  for  the  same  scenario,  where  at  some 
pointduringtheflightofthemissile,all  actuators  lock  to  zero  deflection.  Thisfai  I  ureoccursat 
arandomtimethatisuniformlydistributed  on  theinterval  0-15  seconds.  A  certain  subset  of 
these  impact  points  has  been  highlighted  in  blue.  This  is  explained  below.  These  data  are 
provided  in  thefile'faultset_all_actuators_zeroed_ggml-20000_6.csv'. 


Impact  Point  Distribution 


DiSMrK«  from  missile  launch  (cn) 

Figure  2:  Impact  distribution  for  scenario  in  Table  2  where  all  actuators  lock  to  zero  deflection 
(failure  mode  1) 

The  mai  n  features  of  Figure  2  are  si  mi  lar  to  those  of  Figure  1,  except  that  there  is  a  narrow 
area  of  very  high  impact  density  around  a  segment  of  the  launcher  trajectory  linaThisarea 
corresponds  to  the  subset  mentioned  abovethat  has  been  highlighted  in  blue.  Such  an  areain 
a  scatterplot  of  the  impact  points  has  been  referred  to  as  a  "hook"  in  discussions  with 
Weapons  Systems  Division.  I  n  this  case,  the  hook  appears  to  be  contained  entirely  within  the 
i  mpact  d  i  stri  buti  on ,  u  n  I  i  ke  the  area  ofhighdensityinFigurelwhichliesonthe  bou  nd  ary  of 
the  distribution.  There  is  also  a  higher  degree  of  dispersion  of  the  impact  poi  nts  in  the  zero 
deflection  case  than  in  the  no  failure  case.  This  may  be  due  to  the  additional  variation 
introduced  by  the  randomly  generated  failure  time.  As  in  Figure  1,  the  areas  of  high  impact 
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density  are  irregular  in  shape,  and  this  data  may  not  be  well  modelled  by  a  mixture  of 
bivariate  normal  distributions. 

It  has  been  determined  that  the  "hook"  discussed  above  consists  of  cases  in  which  failure 
occu  rs  pri  or  to  the  expected  ti  me  of  i  ntercept.  For  th  i  s  scenari  o,  the  expected  ti  me  of  i  ntercept 
is  7.4  seconds.  This  hook  has  been  extracted  from  the  dataset  accord!  ng  to  thefai  I  ureti  me,  and 
is  shown  in  Figure  3. 


Impact  Point  Distribution 


5000  10000  1S000  30000 


Oi$tanc«  from  missile  launcn  (m) 

Figure  3:  Impact  distribution  for  pre-intercept  failure  times  from  Figure  2 

Fi  gure  4  shows  the  distri  buti  on  of  i  mpact  poi  nts  when  fai  lure  occurs  after  ti  me  7.4  seconds. 


Impact  Point  Distribution 


5000  10000  15000  30000 


Dtstanc#  from  missii«  launch  (m) 

Figure  4:  Impact  distribution  for  post-intercept  failure  times  from  Figure  2 
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1 1  i  s  cl  ear  f  rom  F  i  gu  re  3  and  Fi  gu  re  4  that  the  d  i  stri  buti  on  of  the  i  mpact  poi  nts  i  s  very  d  i  fferent 
when  the  failure  occurs  before  the  time  of  intended  i  ntercept  than  when  it  occurs  after  this 
time.  Figure  3  shows  that  if  the  failure  occurs  before  this  time,  the  resulting  distribution 
contains  very  limited  dispersion.  1 1  appears  that  inthissituati  on,  the  mi  ssilesimply  continues 
along  a  trajectory  similar  to  its  initial  trajectory  and  eventually  impactstheground.  The  hook 
shown  in  Figure  3  is  clearly  a  region  of  high  impact  density.  If  this  hook  is  contained  entirely 
within  the  envelope  of  the  impact  points  shown  in  Figure  4,  the  chosen  method  of  density 
estimation  is  likely  to  provide  a  reasonable  estimate  near  the  boundary  of  the  impact 
distribution,  as  internal  regions  should  not  affect  the  boundary  regions  to  any  significant 
extent.  Flowever,  as  mentioned  above,  if  the  hook  lies  on  or  outside  the  boundary  of  this 
envelope,  it  may  cause  the  chosen  method  of  density  estimation  to  assign  considerable 
probability  density  to  regions  where  no  impact  points  have  been  observed.  This  will  be 
discussed  further  at  a  later  stage  of  this  report. 

2.1.2  Major  issues  in  Data  Modelling 

Fromthedescriptiveanalysisprovided  in  theprevioussubsection,  weseethatthefollowing 
major  issues  need  to  be  addressed  by  any  technique  attempting  to  reasonably  model  data  of 
the  form  provided: 

1.  The  data  are  heterogeneous.  I  n  this  case,  there  are  cl  early  i  dentifi  abl  e  regi  ons  withi  n 
the  impact  zone  corresponding  to  different  operational  modes.  For  example,  failure 
before  i  ntend  ed  i  ntercept  ti  me  can  resu  1 1  i  n  a  ti  ghti  y  d  ef  i  ned ,  hi  ghi  y  correl  ated ,  i  mpact 
zone.  Failureafter  intended  intercept  time  resultsin  a  much  morewi  despread  impact 
distribution,  but  still  having  distinct  regions  of  higher  density. 

2.  Fligh  density  impact  zones  can  occur  at  the  edge  of  the  "impact"  envelope  (see 
Fi gure  1),  or  more  central  ly  i  n  the  i  mpact  envel  ope  (see  Fi  gure  2). 

For  the  purposes  of  developing  a  RSIT,  high  density  zones  central  to  the  impact  envelope 
may  not  have  to  be  modelled  with  great  precision.  It  is  more  important  that  the  density 
esti  mate  correcti  y  esti  mates  the  overal  I  probabi  I  ity  of  i  mpact  i  n  those  regi  ons. 

1.  On  the  other  hand,  high  density,  sharply  defined  regions  on  the  edge  of  the  impact 
envelope  will  require  a  more  careful  treatment.  Correctly  recognising  a  sharp 
boundary  may  be  useful  in  assessing  the  risk  to  immediately  adjacent  regions. 
FI  owever,  the  consequences  of  i  ncorrectly  identify!  ng  such  a  boundary  could  be  severe 
and,forthisreason,  itisimportanttorecognisethelimitationsoftheavailabledata.  In 
particular,  even  if  a  sharp  boundary  is  present  in  all  simulated  scenarios,  it  may  not  be 
possibletopredictexactly  how  that  boundary  would  be  affected  by  small  violations  of 
thescenario  assumptions  as  are  bound  to  occur  in  practice.  For  this  reason  it  would  be 
prudent  to  allow  for  a  margin  well  beyond  that  indicated  directly  by  thedata. 

2.  Atthistimewehavenotbeen  provided  with  information  ontypical  MaximumEnergy 
Boundaries.  A  M  aximum  Energy  Boundary  defines  the  absolute  maxi  mum  range  of  a 
missile,  in  any  given  direction,  aslimited  by  issues  such  aslaunch  altitude,  velocity, 
weight  and  fuel  load.  Thus,  any  models  developed  for  the  impact  distribution  of  a 
missileshould  be  flexible  enough  to  ad  here  to  external  limits  imposed  by  a  specified 
M  axi mum  Energy  Boundary. 
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3.  Finally,  in  considering  models  for  the  data  and  their  interpretations,  it  is  essential  to 
note  that  any  such  model  I  i  ng  assu  mes  the  data  to  be  representati  ve  of  the  behavi  ou  r  of 
the  physical  system.  In  particular,  the  variation  apparent  in  the  data  is  duesolely  to 
variation  in  certain  input  parameters  such  as  failure  time,  wind-speed  and  seeker 
noise.  The  extent  to  which  the  data  is  representative  is  determined  by  the  extent  to 
w  hi  ch  the  vari  ati  on  of  the  i  nputs  represents  the  system  bei  ng  model  I  ed .  Throughout 
this  report  it  is  assumed  that  the  data  provided  are  representative  of  the  system 
intended  by  DSTO.  Although  this  assumption  is  critical,  by  its  nature  it  cannot  be 
tested  on  the  basis  of  the  data  alone.  Therefore  it  is  an  assumption  rather  than  a 
conclusion  of  this  report  that  the  data  are  representative  of  the  system  intended  by 
DSTO. 

2.2  Techniquesfor  generating  PDFs 

In  the  preliminary  phaseto  this  project,  TRC  Mathematical  Modelling  provided  a  report  on 
the  use  of  Extreme  Value  Theory  and  its  potential  application  to  determining  regions 
encapsulating  a  given  percentage  of  likely  impact  points  [1].  Under  this  approach,  precise 
model  I  i  ng  of  the  enti  re  i  mpact  area  is  not  requi  red  as  thefocus  is  on  the  "edge"  of  the  i  mpact 
zone  and  how  far  it  might  extend  in  any  particular  direction.  In  this  phase  of  the  project,  the 
emphasis  has  changed  to  enabi  i  ng  theRSlT  to  compute  the  expected  risk  of  harm  for  a  given 
test  tempi  ate  and  mapof  population  density.  This  requi  res  model  ling  theprobabi  I  ity  density 
functi  on  of  i  mpacts  across  the  enti  re  i  mpact  area  and  therefore  preci  udes  the  use  of  EVT. 

I  n  the  context  of  the  missi  le  i  mpact  distri  bution,  the  associated  probabi  I  ity  density  functi  on 
assigns  to  any  impact  point  (x,y)  a  probability  density.  An  intuitive  way  of  thinking  of  the 
probability  density  functi  on  isasthecontinuousanalogueoftheprobability  massfunction.A 
higher  probability  density  in  a  particular  region  indicates  a  higher  probability  of  the  missile 
I  and  i  ng  i  n  that  regi  on .  The  i  ntegral  of  a  probabi  I  i  ty  d ensi  ty  f u  ncti  on  over  al  I  possi  bl  e  i  mpact 
points  {x,y)  is  1.  It  should  be  noted,  however,  that  in  order  to  perform  the  calculations 
necessary  for  this  report,  such  as  the  calculation  of  the  expected  number  of  casualties,  the 
estimate  of  the  probability  density  function  is  calculated  over  a  discrete  grid  of  values. 
Therefore,  the  density  estimate  used  in  the  calculations  throughout  this  report  is  in  fact  a 
probability  mass  function  rather  than  a  probability  density  function.  It  should  also  be  noted 
that  during  the  process  of  generating  this  probability  mass  functi  on,  it  is  normalised  so  that 
the  sum  of  the  probabilities  over  all  grid  squares  is  equal  to  1.  Thus,  all  density  estimates  used 
i  n  the  anal ysi s  throughout  this  report  are  val  i  d  probabi  I  ity  mass  fu ncti  ons.  Thi  s  i  nformati  on, 
when  combined  with  information  on  other  external  factors,  such  as  population  density,  can 
then  be  used  to  generate  esti  mates  of  overal  I  i  njury  rates. 

2.2.1  Application  of  Kernel  Smoothing  to  PDF  Generation 

The  problem  at  hand  isto  esti  matethe  probabi  I  ity  density  functi  on  for  thei  mpact  distri  bution 
across  the  area  of  interest.  There  are  many  approaches  to  density  estimation  and  TRC 
Mathematical  Modelling  has  focussed  on  Kernel  Smoothing  (see  [2])  for  this  application.  In 
broad  terms,  methods  for  density  estimation  can  be  classified  either  as  parametric  or  non- 
parametric. 
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Parametric  methods  rely  upon  correctly  specifying  a  family  of  distributions,  such  as  the 
bivariate  Gaussian,  and  then  adjusting  the  parameters  of  that  distribution  to  fit  the  data. 
Although  parametric  methods  could  be  expected  to  providethe  highest  statist!  cal  efficiency, 
they  are  not  appi  i  cabi  e  i  n  th  i  s  case  because  of  the  compi  exi  ty  of  the  d  ata.  I  n  parti  cu  I  ar,  none  of 
the  avai  I  abl  e  fami  I  i  es  of  parametri  c  d  ensi  ti  es  i  s  su  i  tabi  e. 

N  on-parametric  methods  include  histogram  methods,  kernel  density  estimation  and  various 
series  expansions.  Unlike  parametric  estimation,  non-parametric  methods  do  not  assume  a 
particular  form  for  the  impact  distribution.  Histogram  methods  are  extremely  simple  to 
i  mpl  ement  and  i  nvol  vethefewest  assumpti  ons  about  the  i  mpact  distri  but!  on.  H  owever,  they 
do  not  smooth  or  interpolate  between  points  and  hence  are  not  suitable.  Kernel  density 
estimation  and  series  expansions  both  providefor  smoothing  and  interpolation  between  data 
points.  It  was  decided  to  focus  on  kernel  density  estimation  for  practical  reasons;  namely,  the 
method  has  been  extensively  studied  [[3, 4]  and  other  references],  its  theoretical  properties  are 
well  understood  and  efficient  implementations  arewidely  available. 

Kernel  Smoothing  works  in  the  following  way  (taken  from  [2]): 

A  probability  density  function/of  a  random  variableX  can  bedefined  by 

fix)  =  lim-y^(x  -h<X<x  +  h), 

/i->o  2n 


where/7  is  a  constant.  For  any  given/?,  we  can  estimateP(x  -h<X  <x  +  h)  by  the  proportion 
of  the  sample  falling  in  the  interval  ix-h,x  +  h)  .Thus,  a  natural  estimator  of  the  density  is 
given  by 

1 

fix)  = - X  number  of  observations  falling  mix-h,x-\-  h). 

2hn 


This  is  known  ds  the  naive  estimator,  and  can  be  expressed  more  generally  as 


Ax)  =  -tlx^ 

h 


V  A  J 


where  w(-  )  is  a  weight  function  given  by 


wix)  = 


^  2 
0 


if  |x|  <  1, 
otherwise. 


It  is  easy  to  generalise  this  estimator  by  replacing  the  weight  function  w(-)  with  a  kernel 

function  K(-)  that  satisfies  the  condition 

00 

I  Kix)dx  =  1. 
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Usually,  K(- )  will  beasymmetricprobability  density  function. Thus, theterndesfimatorwith 
kernel  K(-  )  is  defined  by 


nht;  \ 


h 


where/7  isknown  as  thefcanrfwzrff/z  parameter.  Theidea  of  thekernel  density  estimator  is  that  it 
smooths  theraw  data  into  a  probability  density  function.  That  is,  it  spreads  the  weight  of  each 
data  poi  nt  over  a  wi der  area  (determi  ned  by  the  kernel  functi  on)  so  that  it  "fi  I  Is  i  n"  the  gaps 
between  each  data  point,  whilst  assigning  greatest  probability  density  to  the  areas  with  the 
greatest  concentration  of  data  points. 


The  naive  estimator  can  be  considered  as  the  sum  of  equal-sized  'boxes',  with  each  box 
centred  at  an  observed  data  poi  nt.  I  n  thesame  way,  the  kernel  esti  mator  can  be  consi  dered  as 
asumof  smooth  'bumps'  placed  at  the  observations.  The  kernel  determines  the  shape  of  the 
bumps  while  the  bandwidth  h  determines  their  width.  If  the  selected  bandwidth  is  too 
n  arrow ,  th  ese  bu  mps  w  i  1 1  not  over  I  ap,  an  d  th  e  resu  I  ti  ng  d  ensi  ty  esti  mate  w  i  1 1  be  a  col  I  ecti  on  of 
isolated  bumps  of  probability  density.  This  may  also  occur  if  the  data  are  too  sparse.  On  the 
other  hand,  if  the  bandwidth  istoo  large,  each  pointwill  bespread  over  avery  large  area,  and 
consi  derabi  e amounts  of  probabi  I  ity  density  w  i  1 1  beal  I  ocated  to  areas  where  no  data  has  been 
observed.  Thus,  it  is  important  to  select  an  appropriate  bandwidth. 


The  2-dimensional  generalisation  of  the  kernel  estimator  is  given  by 


f(x,y)  = 


1 


x-X,  y-r, 


y  j 


whereK(- )  is  now  a  2-dimensional  kernel  function. Clearly,thereareanumber  of  thingsto  be 
determined  before  using  the  kernel  density  estimator.  Firstly,  one  must  choose  the  kernel 
function  K(- ).  Secondly,  in  practice,  thedensity  esti  mate  is  computed  over  a  discrete  grid  by 
evaluating  the  formula  given  above  at  each  point  on  the  grid.  Thus,  it  is  also  necessary  to 
determi  ne  the  d i  mensi  ons  of  thi s  gri  d .  Throughout  thi  s  report,  thi  s  w i  1 1  be  referred  to  as  the 
granularity  of  the  density  esti  mate.  Finally,  and  importantly,  one  must  choose  the  values  of 
the  two  bandwidth  parameters,  hx  and  hyAnsd  ecti  ng  the  bandwidths,  there  are  a  number  of 
methods  from  which  to  choose.  For  2-dimensional  kernel  density  estimation,  the  most 
common  methods  of  selecting  the  bandwidth  are  to  apply  a  certain  standard-distribution 
based  formulaorto  use  cross-validati  on. Theseissues  are  addressed  in  Section  2.2.3. 


2.2.2  Application  of  Kernel  Smoothing  to  the  M  issile  Impact  Data 

By  applying  kernel  density  estimation  to  the  missile  impact  data  and  performing  certain 
mani  pul  ati  ons  on  the  kernel  density  esti  mates  obtai  ned,  it  is  possi  bl  eto  obtai  n  an  esti  mate  of 
the  mi ssi  I  e  i  mpact  d i  stri  buti  on  that  appears  to  be  reasonabi  e. 
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This  process  has  been  carried  out  for  thescenario  described  inTabie2,  and  afinai  estimate  of 
the  mi  ssi  i  e  i  mpact  d i  stri  buti  on  was  obtai  ned .  A  representati  on  of  thi  s fi  nai  esti  mate  i s  show  n 
in  Figures,  beiow.  Thefigure  shows  the  probabiiity  mass  function  obtained  through: 

•  Kernei  smoothing  the  data  provided. 

•  Appiying  an  artificiaiiy  created  (circuiar)  Maximum  Energy  Boundary  to  the 
probabiiity  mass  function  created. 

•  Usi  ng  different  coi  ours  i  n  thefi  gureto  represent  different  i  eveis  of  probabi  i  ity  density. 

The  iegend  at  the  ieft  of  the  piot  gives  an  indication  of  the  ievei  of  density  to  which  the 
vari  ouscoi  ours  correspond.  Note  that  duetotheiargenumber  of  grid  squares  into  which  the 
area  of  thedensity  is  divided,  theieveisof  probabiiity  assigned  to  each  grid  square  are  very 
smaii.  For  this  reason,  the  piots  of  the  kernei  density  estimates  in  this  report  show  the  iog 
(base  10)  of  the  density.  Thus,  for  exampie,  a  grid  square  of  thesame  coi  our  as  thefi  rst  square 
on  the  iegend  indicates  an  estimated  probabiiity  density  between  0.01  and  0.001.  Areas  with 
estimated  density  iessthan  10^20  have  si  mpiy  been  coioured  white.  Theapparent  truncation  of 
the  probabi  i  ity  density  esti  mate  i  n  the  top  i  eft  and  bottom  i  eft  regi  ons  is  d  ueto  appi  i  cation  of 
the  ci  rcu  i  ar  M  axi  mu  m  E  nergy  Bou  nd  ary  used  i  n  th  i  s  exampi  e.  i  t  i  s  i  mportant  to  note  that  the 
kernei  density  esti  mate  piots  shown  i  n  this  report  depi  cttheprobabi  i  ity  density  per  unit-ceii. 
To  caicuiate the  probabiiity  per  metre-squared  the  probabiiity  density  in  each  ceii  must  be 
divided  by  the  area  of  the  cei  i . 

The  step-by-step  process  by  which  this  esti  mate  was  obtained  isnow  expiained. 


Logl  0  of  Kernel  Density  Estimate 
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Figure  5:  Final  kernel  density  estimate  for  the  scenario  described  in  Table  2 

Thefi  rst  step  toward  obtai  ni  ng  an  overai  i  kernei  density  esti  mate  for  a  parti  cui  ar  scenari  0  isto 
computeindividuai  kernei  density  esti  matesfor  each  faiiuremodefor  that  scenario.  FigureO 
and  Figure7show  thekernei  density  esti  mates  for  the  data  shown  in  Figureland  Figure  2, 
respectiveiy,  which  correspond  tofaiiuremodesO(nofaiiure)  and  l(zero  defiection)  for  the 
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scenari  o  descri  bed  i  n  Table  2.  The  actual  I  mpact  pol  nts  are  also  shown  I  n  order  to  1 1 1  ustratethe 
quality  of  the  density  estimate. 

For  the  kernel  density  estimates  shown  In  Figure  6  and  Figure?,  the  bandwidth  parameter 
was  chosen  according  to  a  formula  discussed  In  detail  In  Section  2.2.3.3  and  the  kernel 
estimator  was  applied  on  a  256x256  grid  overlaid  on  the  dataset.  The  area  for  which  the 
density  estimate  was  computed  for  this  scenario  was  determined  from  the  range  of  the 
combi  ned  data  for  both  fal  lure  modes.  Thel  ndl  vl  dual  kernel  density  esti  mates  for  each  fal  lure 
mode  of  that  scenario  were  computed  on  a  common  grid  In  order  to  be  able  to  calculate  a 
weighted  average  of  the  distributions  of  the  various  failure  modes  for  that  scenario.  This 
weighted  average  Is  discussed  and  calculated  In  Section  2.3.1,  below.  The  overall  x-rangeof 
the  density  estimate  for  the  scenario  to  which  Flgure6and  Figure?  correspond  ls(-6?2?.9, 
32693.8),  theoveral  I  y- range  Is  (-25956.4, 22538.2),  and  the  d I  mansions  of  the  rectangl  es  I  n  the 
discretisation  of  the  Impact  distribution  for  the  Individual  failure  modes  are  153.99  m  by 
189.43  m.  This  will  also  be  the  discretisation  of  thewelghted  average. 


Log1 0  or  Kernel  Density  Estimate 


Figure  6:  Kernel  density  estimate  for  scenario  in  Table  2  where  no  failure  occurs  (failure  mode  0) 

Itcan  be  observed  from  Flgure6thatthekernel  density  estimate  appears  to  bea  reasonable 
estimate  of  the  underlying  density  that  generated  the  Impact  points  shown.  Since  the  vast 
majority  of  the  I  mpact  pol  nts  fal  I  In  one  of  two  areas  of  especially  high  Impact  density,  the 
estI  mated  probabi  I  Ity  density  I  n  these  two  areas  Is  much  greater  than  I  n  theremal  nl  ng  area  of 
the  plot.  A  further  result  of  thetwo  areas  of  very  high  I  mpact  density  Is  that  the  bandwidth 
chosen  Isquitesmall,  and  thereforethetallsofthekernel  density  esti  matedecay  quite  rapidly. 

As  discussed  above,  a  possible  consequence  of  not  having  enough  data  or  choosi  ng  too  smal  I 
a  bandwidth  Isthat  the  resulting  kernel  density  esti  mate  Is  a  collection  of  Isolated  bumps  of 
probability  density,  corresponding  to  the  data  points.  In  Figure  6,  above,  the  kernel  density 
esti  mate  appears  to  be  a  reasonabi  e  esti  mate  I  n  most  areas  of  the  pi  ot,  but  I  n  the  areas  w  here 
the  data  are  sparse.  Isolated  bumps  of  density  can  be  observed.  The  Issue  of  bandwidth 
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sd  ecti  on  i  s  further  d  i  scussed  bd  ow ,  as  i  s  the  i  ssue  of  assess!  ng  w  hether  or  not  the  dataset  i  s  of 
sufficient  size. 


Log1 0  or  Kernel  Density  Estimate 


I - 1 - 1  I - 1 - ' 
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Distance  from  mssile  launch  (km) 

Figure  7;  Kernel  density  estimate  for  scenario  in  Table  2  where  all  actuators  lock  to  zero  deflection 
(failure  mode  1) 

The  kernd  density  estimate  in  Figure  7  also  appears  to  be  a  reasonable  estimate  of  the 
underlyi  ng  density  that  generated  the  i  mpact  poi  nts  show  n.  The  areas  of  greatest  esti  mated 
probabi  I  ity  density  correspond  cl  osd  y  to  the  areas  of  greatest  i  mpact  density .  The  dispersi  on 
of  the  impact  points  is  greater  in  the  zero  deflection  case  than  in  the  no  failure  case. 
Consequently,  the  bandwidth  chosen  is  larger,  and  the  tails  of  the  kernd  density  estimate 
decay  I  ess  rapi  d  I  y  i  n  the  zero  d  ef  I  ecti  on  case.  FI  ow  ever,  the  rate  of  d  ecay  i  n  the  zero  d  ef  I  ecti  on 
caseisstill  quite  rapid  sincethisisan  inherent  feature  of  the  bivariate  normal  kernd. 

The  code  used  to  analyse  the  data  provide  by  DSTO  and  to  generatethe  plots  in  this  report 
was  devd  oped  by  TRC  Mathematical  M  odd  I  i  ng  in  the  "R"  statist!  cal  analysis  package.  R  is  a 
freeware  software  package  aval  I  able  for  download  from  www.r-project.org.  It  has  built-in 
routines  for  handling  large  datasets  and  applying  kernd  smoothing  techniques  to  2- 
dimensional  data,  such  as  the  impact  point  data  generated  by  DSTOs  missile  flight 
simulations. 

Looking  atthefigures,  weobservethefollowing  key  features: 

1.  Asdescribed  earlier,  kernd  density  estimation  "fillsin"  areas  of  low  impact  density 
that  exist  physically  between  two  high  density  impact  regions.  See  Figure  7  for  an 
example.  The  amount  of  in-fill,  or  spreading,  is  controlled  by  the  bandwidth 
parameter.  Thefol  I  ow  i  ng  secti  on  provi  des  a  detai  I  ed  coverage  of  how  the  band  w  i  dth 
parameter  affects  the  probabi  I  ity  density  functi  on  generated. 

2.  When  a  high  density  impact  area  exists  on  the  edge  of  the  envdope  of  the  impact 
points,  kernd  smoothing  spreads  a  portion  of  the  distribution  to  the  area  outside  of 
the  envdope.  See  Figure  6  for  an  example.  If  the  simulated  data  are  bdieved  to 
provideavery  accurate  representation  oftheactual  impact  distribution  that  could  be 
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expected  in  practice,  therewould  bean  argument  for  applying  a  smaller  bandwidth 
parameter  to  points  in  those  impact  regions.  In  effect,  this  would  "tighten"  the 
probability  density  function  generated  to  moreclosely  represent  thedata.  Ontheother 
hand,  if  there  is  any  doubt  about  the  integrity  of  theinput  data,  then  such  a  tightening 
would  bea  more  aggressive,  rather  than  conservative,  approach  to  the  development  of 
range  safety  tempi  ates. 

3.  Anotherissuetoconsiderindeterminingtheeffectivenessof  kernel  density  estimation 
i  s  the  si  ze  of  the  d ataset.  C I  earl  y,  i  f  the  nu  mber  of  d ata  poi  nts  aval  I  abl  e  i  s  too  smal  I ,  i  t 
is  impossible  to  obtain  a  good  estimate  of  the  underlying  density.  As  mentioned 
earlier,  a  possible  consequence  of  this  is  that  the  resulting  kernel  density  estimate 
consists  of  a  col  I  ecti  on  of  isolated  bumps  of  probabi  I  ity  density  each  correspond!  ng  to 
a  data  point.  This  issue  isfurther  complicated  by  the  fact  that  the  given  data  exhibit  a 
high  degree  of  heterogeneity  in  the  density  in  different  areas  of  the  impact 
di  stri  buti  on.  Thus,  w  hi  I  ethere  may  be  pi  enty  of  data  aval  I  abl  eto  esti  mate  the  density 
in  some  areas  of  the  impact  distribution,  thedata  may  be  very  sparsein  other  areas 
which  do  in  fact  contain  a  significant  amountof  probability  density. 

Items  1  and  2,  in  particular,  highlight  that  while  kernel  smoothing  has  many  useful 
character!  sties  with  regards  to  generating  probability  density  functions  from  simulated  data, 
its  use  in  the  development  of  range  safety  templates  requires  careful  application  and  the 
revi  ew  of  I  n  put  data  by  appropriate  subject  matter  experts  i  n  the  weapons  area.  Theselection 
of  appropri  ate  kernel  smooth!  ng  input  parameters  i  s  not  si  mpl  y  a  matter  of  stati  sti  cal  anal  ysi  s 
and  blind  application  of  existing  formulae  for  selection  of  the  bandwidth  parameter. 

2.2.3  Issues  in  the  Application  of  Kernel  Smoothing 

22.3.1  Choice  of  Kernel 

As  mentioned  in  Section  2.2.1,  there  are  a  number  of  decisions  that  need  to  be  made  in  order 
to  apply  kernel  density  estimation.  Thefirst  of  these  isthechoice  of  the  kernel  function  itself. 
A  common  choice  of  bivariate  kernel  function  is  the  bivariate  normal  density  function.  There 
are  a  number  of  other  kernel  functions  that  may  be  used,  some  of  which  are  more 
computati  onal  I  y  eff  i  ci  ent  than  the  normal  kernel .  H  owever,  computati  onal  effi  ci  ency  has  not 
presented  any  significant  problems  in  the  implementation  ofthekernel  density  estimator. 

A  second  issuewiththenormal  density  function  isthatthetail  decaysfasterthane''.Thus,the 
tai  I  s  of  the  resu  I  ti  ng  kernel  densi  ty  esti  mate  al  so  d  ecay  exponent!  al  I  y .  I  n  the  case  of  the  mi  ssi  I  e 
I  mpact  d  i  stri  buti  on ,  the  i  nterpretati  on  of  thi  s  i  s  that  as  the  i  mpact  I  ocati  on  moves  aw  ay  from 
the  regions  of  high  impact  density,  the  probability  density  associated  with  that  impact 
I  ocati  on  decays  very  rap!  dly.  Thi  s  may  be  an  undesi  rabi  efeature  of  an  esti  mate  of  the  i  mpact 
densi  ty,  si  nceit  may  imply  that  impact  locati  onsjustoutsidetheenvel  ope  of  thei  mpact  points 
observed  in  the  given  dataset  have  very  small  associated  impact  density.  It  may  be  more 
desirableto  associate  higher  impact  densities  with  these  poi  nts.  However,  this  issue  is  also 
related  to thechoice of  bandwidth,  si ncea  larger  bandwidth  createsalongertail.Theissueof 
band  w  i  dth  sel  ecti  on  i  s  f  u  rther  d  i  scussed  bel  ow ,  but  f  or  now ,  i  t  i  s  suff  i  ci  ent  to  note  that  rapi  d 
decay  of  the  tai  I  of  the  normal  kernel  can  be  overcome,  to  some  extent,  by  an  appropriate 
choice  of  bandwidth. 
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One  advantage  of  the  normal  kernel  inapplicationssuchastherangesafety  projectisthatitis 
mu ch  more  I  i  kel  y  to  be fami  I  i  ar  to  non-stati  sti  ci  ans,  and  i  ts  properti  es  are  much  more  w  i  d el  y 
known.  Since  thetwo  major  issueswith  the  normal  kernel  can  both  be  overcome,  thekernel 
function  used  for  the  anal  ysisthroughoutthisreportisthebivari  ate  normal  density  function. 

22.3.2  Granularity  and  Range 

A  second  aspect  to  be  determined  isthegrid  overwhich  thedensity  estimateiscomputed.  In 
order  to  determine  this  grid,  it  is  sufficient  to  determine  the  range  over  which  thedensity 
estimateisto  be  computed  and  the  dimensions  of  the  grid  (that  is,  the  granularity). 

For  the  analysis  throughout  this  report,  the  range  over  which  the  density  estimate  is 
computed  has  been  determined  by  taking  the  range  of  the  impact  point  data  and  adding  a 
border  of  width  10km  around  the  outside.  In  all  of  the  kernel  density  estimates  shown 
throughout  this  report,  it  can  be  seen  that  the  probability  density  at  any  point  outsidethis 
range  i  s  I  ess  than  IQ20.  Therefore,  i  t  has  been  d  eci  d  ed  that  thi  s  i  s  a  suff  i  ci  enti  y  w  i  d  e  range  over 
which  to  compute  the  kernel  density  estimate.  In  a  subsequent  step  of  the  procedure  for 
obtaining  an  estimate  of  the  impact  density  for  a  particular  scenario,  a  maximum  energy 
boundary  will  be  applied  to  the  kernel  density  estimate.  The  application  of  a  maximum 
energy  boundary  sets  the  estimated  probability  density  in  all  grid  squares  outside  of  the 
boundary  to  0.  Therefore,  provided  that  the  range  of  the  original  kernel  density  estimate  is 
I  arge  enough  to  contai  n  the  enti  re  maxi  mum  energy  boundary,  the  effect  of  the  range  on  the 
kernel  density  estimate  will  be  negligible.  In  fact,  x-  and  y-ranges  of  the  maximum  energy 
boundary  may  be  appropriate  choices  for  the  range  of  the  grid  over  which  to  compute  the 
kernel  density  estimate.  Maximum  energy  boundaries  are  further  discussed  in  Section  2.3.2. 

I  n  determi  ni  ng  the  granul  arity  of  the  esti  mate,  there  are  two  competi  ng  factors  to  consi  der. 
Firstly,  it  is  cl  ear  that  a  finer  discretisation  will  provi  deafiner  esti  mateof  thedensity  function. 
FI  owever  there  are  also  limits  imposed  on  the  granularity  by  computational  issues.  For  the 
analysis  described  in  this  report,  the  granularity  was  chosen  such  that  a  sufficiently  fine 
esti  mate  was  obtai  ned  w  hi  I  st  mai  ntai  ni  ng  a  reasonabi  e  computati  on  ti  me.  The  granu  I  ari  ty  of 
each  density  esti  mategiven  inthisreportis256x256.  FI  owever,  the  software  used  does  allow 
the  user  to  select  a  granularity  of  their  own  choice  (for  example,  128  or  512). 

2. 2. 3. 3  Bandwidths 

Thefi  nal  choice  to  be  made  is  the  val  ues  of  the  bandwi  dth  parameters.  I  n  many  ways,  this  is 
the  most  i  mportant  choi  ce  si  nee "  both  theory  and  practi  ce  suggest  that  choi  ce  of  kernel  i  s  not 
crucial  to  the  statistical  performance  of  the  method  and  therefore  it  is  quite  reasonable  to 
chooseakernel  for  computational  efficiency"  [3].  Themost  common  methods  of  selecting  the 
bandwidth  are  to  use  one  of  two  common  standard-distribution  based  formulae,  or  to  use 
cross- val  i  d  ati  on .  These  stand  ard-d  i  stri  buti  on  based  f  ormu  I  ae  are  referred  to  i  n  the  I  i  teratu  re 
as  "rules  of  thumb".  Although  this  choice  of  term  suggests  that  the  formulae  may  not  be 
appropriate  for  range  safety  purposes,  they  are  in  fact  quite  valid  as  long  as  certain 
assumptions  about  thedata  hold.  See  section  3.4.2  of  reference  [3]  for  details. 

The  rul  es  of  thumb  given  i  n  the  I  iterature  actual  ly  apply  to  1-di  mensi  onal  datasets.  They  can 
be  applied  separately  to  each  dimension  of  a  2-dimensional  dataset,  but  this  is  not  always 
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appropriate.  The  first  rule  of  thumb  is  to  select  the  bandwidth  according  to  the  following 
formula: 


h  =  1.06  X  min 


cr, 


1 

xn  ^ , 


w  here  I Q R  i  s  the  i  nterquarti  I  e  range  of  the  d i  stri  buti  on . 

A  theoretical  derivation  ofthisruleisgiven  in  [2,  pg45].Thesecond  ruleofthumbisasimple 
variation  of  the  first,  where  the  factor  of  1.06  is  replaced  by  0.9.  The  motivation  behind  this 
variation  isthatwith  afactorof 0.9,theerrorofthedensityestimatewill  bewithin  10%ofthe 
minimum  error.  However,  it  is  more  common  to  use  the  first  rule  of  thumb. 

Theobjectiveofthecross-validation  methodsisto  choose  the  bandwidth  that  gives  the  best  fit 
to  the  data.  The  data  are  partitioned  intoanumber  of  equal-sized  subsets.  One  of  the  subsets 
is  removed  from  the  dataset,  and  the  remaining  data  are  used  to  calculate  a  kernel  density 
esti  mate.  Thi  s  i  s  repeated  for  each  su  bset,  and  a  measure  of  the  error  of  the  esti  mate  based  on 
the  removed  subset  is  calculated.  This  procedure  is  repeated  for  a  number  of  different 
bandwidths,  and  the  bandwidth  ischosen  to  givetheminimum  error.  This  isci early  a  useful 
meth  od  of  ch  oosi  n  g  th  e  ban  d  w  i  d  th .  1 1  al  so  h  as  th  e  ad  vantage  th  at  i  t  may  be  u  sed  to  sel  ect  th  e 
best  2-dimensional  bandwidth,  rather  than  choosing  the  bandwidth  independently  for  each 
d  i  mensi  on .  H  ow  ever,  i  t  i  s  consi  d  erabi  y  more  d  i  ff  i  cu  1 1  to  i  mpl  ement,  and  i  f  reasonabi  e  resu  I  ts 
can  be  obtained  using  a  simpler  method,  it  may  not  be  necessary  to  expend  the  additional 
effort. 

For  the  given  data,  the  first  rule  of  thumb  appears  to  give  reasonable  results.  Si  nee  it  is  the 
most  common  method,  and  the  simplest  to  implement,  the  bandwidths  used  to  obtain  the 
kernel  density  estimates  throughout  this  report  have  been  chosen  using  the  first  rule  of 
thumb. 

A  further  possibility  in  bandwidth  selection  is  known  as  an  adaptive  bandwidth,  or  dynamic 
bandwidth.  The  idea  is  that  different  bandwi  dths  may  be  used  for  different  regions  of  the  (x,y) 
area  of  the  density  estimate.  A  number  of  dynamic  bandwidth  techniques  have  been 
d  evel  oped ,  but  to  i  mpl  ement  su  ch  a  method  effect!  vel  y  w  ou  I  d  take  a  consi  d  erabi  e  amou  nt  of 
additional  implementation  and  validation.  To  effectively  implement  such  a  method  is 
th eref 0 re  n ot  f easi  bl  e  w  i  th  i  n  th e  ti  mef rame  of  the  cu  r rent  p roj ect. 

2. 2. 3. 4  Size  of  Dataset 

Before  appi  yi  ng  kernel  density  esti  mati  on,  it  is  necessary  to  ensure  that  thedataset  contai  ns  a 
sufficient  number  of  points  to  obtain  a  reasonable  density  estimate.  The  smallest  dataset 
provided  by  DSTO  from  which  a  kernel  density  estimatewasto  be  obtained  was  of  10,000 
points.  However,  unless  stated  otherwise,  the  kernel  density  esti  mates  shown  throughoutthis 
report  are  based  on  datasets  of  20,000  poi  nts.  A  possi  bl  e  way  of  determi  ni  ng  the  number  of 
points  required  to  obtain  an  acceptable  kernel  density  estimate  is  to  obtain  independent 
samples  of  various  sizes,  and  generate  kernel  density  esti  mates  for  each.  Assuming  that  the 
largest  sample  is  sufficient  to  obtai  n  an  accurate  kernel  density  esti  mate,  the  kernel  density 
estimates  based  on  the  smaller  samples  can  be  compared  to  that  of  largest  sample.  If  the 
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kernel  d  ensi  ty  esti  mates  based  on  sampi  es  of  a  certai  n  si  ze  are  suff  i  ci  enti  y  si  mi  I  ar  to  the  kernel 
density  esti  mate  based  on  the  I  argest  sampi  e,  and  I  i ttl  e  i  s  gai  ned  by  usi  ng  a  sampi  e  I  arger  than 
this  size,  then  this  may  indicate  that  this  is  a  sufficient  number  of  data  points  to  obtain  an 
acceptable  kernel  density  estimate. 

An  approach  similar  to  that  described  above  has  been  applied  to  a  dataset  of  50,000  points. 
H  owever,  with  only  50,000  poi  nts,  it  is  i  mpossi  bleto  obtai  n  many  i  ndependent  sampi  es.  For 
this  reason,  the  samples  used  in  this  investigation  are  subsets  of  the  50,000-point  dataset. 
Thus,  the  corresponding  kernel  density  estimates  may  be  more  similar  to  the  50,000-point 
esti  mate  than  would  beexpected  if  thesampleswereindependent.  FI  owever,  itisstill  possible 
to  gai  n  some  i  nsi  ght  i  nto  the  number  of  poi  nts  requi  red  to  obtai  n  an  accurate  kernel  density 
estimate. 

For  the  scenario  described  in  Table  2,  only  20,000  points  were  available.  FI  owever,  for  a 
different  scenario,  a  dataset  of  50,000  points  was  available.  Therefore,  the  dataset  of  50,000 
points  has  been  used  in  this  investigation.  In  fact,  the  scenario  from  which  the  dataset  of 
50,000  points  was  derived  is  a  reflection  in  the  x-axis  of  the  scenario  described  in  Table  2. 
Consequently,  a  degree  of  symmetry  can  be  observed  between  both  the  scatterplots  and 
kernel  density  esti  mates  for  thesetwoscenarios.Thissymmetry  will  bediscussed  laterinthis 
report. 

Firstly,  the  kernel  density  has  been  computed  for  the  dataset  of  50,000  observations.  For 
samples  sizes  5000  -  30,000  points,  at  intervals  of  5000,  20  subsets  of  each  size  have  been 
randomly  selected,  and  a  kernel  density  esti  mate  has  also  been  generated  for  each  subset.  The 
scatterp  I  ot  an  d  kernel  d  ensi  ty  esti  mate  f  or  th  e  enti  re  d  ataset  of  50,000  observati  ons,alongwith 
typical  scatterplots  and  kernel  density  esti  mates  for  each  samplesizearegiven  in  AppendixB. 
A  ppendixB  also  contains  plots  of  10"^  exclusion  zonesfor  each  ofthekernel  density  estimates. 

For  each  size,  the  20  subsets  of  that  size  have  been  used  to  compute  an  'average'  kernel 
density  esti  mate  for  that  si  ze.  This  gi  ves  an  i  ndi  cati  on  of  the  'average'  kernel  density  esti  mate 
that  might  be  calculated  from  a  sample  of  that  size  (assuming  that  the  sample  of  50,000  is  a 
good  representation  of  the  impact  distribution).  Each  average  kernel  density  estimate  was 
then  compared  with  the  50,000-point  kernel  density  estimate. 

The  measure  of  difference  used  for  the  comparison  of  the  average  kernel  density  estimates 
with  the50,000-poi  nt  esti  mate  was  the  sum  of  theabsol  ute  differences  at  each  grid  square.  It  is 
notdifficuitto  see  that  for  two  identical  distributions,  this  measure  will  be  equal  to  0,  and  as 
two  distributions  become  more  and  more  different,  this  measure  will  increase.  Throughout 
thisreport,  the  difference  between  an  average  kernel  density  esti  mate  and  the  50,000-point 
esti  mate  usi  ng  this  measure  wi  1 1  be  referred  to  as  a  density  difference. 

TableBshowsthedensity  differences  for  average  kernel  density  esti  mates  for  varioussample 
sizes.  Table  3  also  shows  both  the  absolute  and  relative  marginal  decrease  in  density 
difference  as  the  sample  size  increases.  These  quantities  give  an  indication  of  the 
i  mprovement  i  n  the  kernel  density  esti  mate  gai  ned  by  usi  ng  a  I  arger  sampi  e.  For  conveni  ence, 
the  absolute  and  relative  differences  have  also  been  plotted  againstthenumber  of  points,  and 
are  shown  in  Figure  8  and  Figure  9. 
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N  ote  that  whilst  both  the  ratio  of  density  differences  and  the  relative  decrease  in  density 
difference  are  both  relativeto  the  density  difference  with  a  further  5000  points,  they  do  not 
represent  the  same  quantity.  The  formul  ae  by  whi  ch  each  of  these  quantiti  es  was  cal  cul  ated 
are  given  below. 


„  .  .  ^  Density Difference(n) 

Ratio  of  Density  Difference  = - , 

Density  Difference(n  -  5000) 


and 


„  r  ■  rs  ■  rs  ■  T^-rr  Dcnstty  Differenccfn  -  5000)  -  Dcnstty  Differenccfn) 

Relative  Decrease  in  Density  Difference  = - , 

Density  Difference(n  -  5000) 

where  Density  Dijference(n)  represents  the  density  difference  for  an  n-point  average  kernel 
density  estimate.  Whereas  the  ratio  of  density  differences  gives  the  size  of  the  difference 
relativeto  the  difference  with  5000  fewer  points,  the  relative  deer  ease  in  density  difference 
gives  the  decrease  in  density  difference  achieved  by  adding  a  further  5000  points.  In  fact,  it 
can  be  seen  from  theformulae  above  that  these  two  quantities  are  related  by  the  equation 

Ratio  of  Density  Difference  =  1  -  Relative  Decrease  in  Density  Difference. 


Table  3:  Differences  between  average  kernel  density  estimates  and  50,000-point  kernel  density 
estimate  for  various  sample  sizes 


Sample  Size 

Density  Difference 
from  50,000-point 
estimate 

A  bsoiute  Decrease 
in  Density 

Difference 

Ratio  of  Density 
Difference 

Relative  Decrease  in 
Density  Difference 

5000 

0.30563153 

NA 

NA 

NA 

10,000 

0.2134398 

0.09219173 

0.698356613 

0.301643387 

15,000 

0.1582162 

0.0552236 

0.741268498 

0.258731502 

20,000 

0.11973258 

0.03848362 

0.756765616 

0.243234384 

25,000 

0.09010242 

0.02963016 

0.752530514 

0.247469486 

30,000 

0.06645264 

0.02364978 

0.737523365 

0.262476635 

In  considering  these  results,  it  should  denoted  thatthedifferencebetweentheaveragekernel 
density  estimate  and  the  50,000-point  kernel  density  estimate  do  not  necessarily  reflect  the 
size  of  the  difference  that  could  be  expected  for  a  kernel  density  estimate  obtained  from  a 
single  sample  of  the  given  size.  In  particular,  it  could  be  expected  that  the  errors  associated 
w  i  th  a  si  ngl  e  sampi  e  w  ou  I  d  be  I  arger  that  those  associ  ated  w  i  th  the  average  of  20  i  nd  epend  ent 
samples.  It  is,  nevertheless,  reasonable  to  assume  that  the  general  pattern  of  the  difference 
decreasing  as  samplesizeincreases  would  also  apply  to  singlesamples.Themagnitudeof  the 
difference  for  a  given  sample  size  could,  in  principle,  be  estimated  from  the  20  samples  but 
this  calculation  has  not  been  performed  with  the  present  data. 
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Trend  in  absolute  differences  between  density  differences 


5000  10000  15000  20000  25000  30000 

Number  of  Observations 

Figure  8:  Trend  in  absolute  marginal  density  difference  as  sample  size  is  increased. 


Trend  in  relative  differences  between  density  differences 


5000  10000  15000  20000  25000  30000 

Number  of  Observations 


Figure  9:  Trend  in  relative  marginal  density  difference  as  sample  size  is  increased 

Together  with  Table  3,  Figure  8  and  Figure  9  show  that  as  the  sample  size  increases,  the 
difference  between  the  50,000-point  kernel  density  estimate  and  the  average  kernel  density 
esti  mate  becomes  smal  I  er.  Thi  s  i  s  as  expected .  1 1  can  al  so  be  observed  that  as  the  sampi  e  si  ze 
i  ncreases,  the absol  ute  margi  nal  d i ff erence gai  ned  by  ad d i  ng  extra  poi  nts  d i  mi  n i shes,  and  the 
relative  marginal  difference  remains  reasonably  constant.  It  is  difficult  to  determine  an 
acceptable  value  of  this  difference,  especially  since  the  samples  used  were  subsets  of  the 
50,000  point  dataset,  butitwould  appear  that  the  kernel  density  esti  matemaybesignificantly 
improved  by  increasing  the  number  of  points,  even  beyond  20,000. 

This  issue  requires  further  investigation  before  undertaking  a  large  project  using  kernel 
density  esti  mates  and  thevarioustechniquesproposed  in  this  report.  Additional  ly,  this  issue 
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is  further  compi  icated  by  the  high  degree  of  heterogeneity  in  the  density  i  n  different  areas  of 
theimpactdistri  bution.Thisissueisthereforealsorelated  to  thechoice of  bandwidth  si  nee  it 
may  be  appropri  ate  to  use  a  I  arger  band  w  i  dth  for  areas  w  here  the  data  are  more  sparse.  Thus, 
in  future  analysis  of  the  missi  le  i  mpact  data,  it  may  be  helpful  to  use  a  dynamic  bandwidth 
method. 

It  is  important  to  note,  however,  that  whilethere  may  bea  relatively  significant  change  in  the 
detai  Is  of  the  probabi  I  ity  density  functi  on  asthe  number  of  poi  nts  is  i  ncreased,  this  is  but  one 
measure  of  comparison  and  may  not  be  the  most  significant  measure. 

Appendix  B,  Figure  64  through  Figure  69,  shows  the  convex  exclusion  zones  generated  for 
each  of  thedatasets.  Visually,  wesee  that  theexclusi  on  zone  changes  very  little  as  the  number 
of  points  is  increased  beyond  15,000,  and  even  the  difference  between  5,000  data  poi  nts  and 
30,000  data  points  is  relatively  small.  Based  on  this  form  of  measure  and  this  particular 
dataset,  even  5000  poi  nts  may  be  sufficient  depending  on  the  degree  of  accuracy  required, 
especially  if  other  safety  margins  will  be  subsequently  applied. 

2.23.5  High  density  Regions  on  Boundary  of  Impact  Envelope 

A  further  issue  associated  with  the  heterogeneity  of  the  impact  density  is  that  if  there  is  an 
area  of  high  impact  density  that  lies  on  the  boundary  of  thei  mpact  envelope,  itispossiblethat 
a  considerable  amount  of  the  density  associated  with  that  area  could  be  spread  to  regions 
beyond  the  impact  envelope,  where  no  data  has  been  observed.  FI  owever,  this  may  not  be  as 
great  a  probi  em  as  it  may  seem,  because  Fi  gu  re  6  and  Figure?  show  that  i  n  the  no  fai  I  u  re  case 
(Figure  6),  where  the  areas  of  high  impact  density  are  much  nearer  to  the  boundary  of  the 
impact  envelope,  the  amount  of  probability  density  spread  to  regions  beyond  the  impact 
envelope  is  actually  smaller.  The  explanation  of  this  is  that  in  the  no  failure  case,  smaller 
band  w  i  dths  arechosen,  and  the  probabi  I  i  ty  i  s  spread  over  a  smal  I  er  area.  I  n  combi  nati  on  w  i  th 
the  relatively  low  probability  of  a  fai  I  ureas  compared  with  the  "no  fai  lure"  mode,  the  actual 
amount  of  probability  assigned  outside  of  the  simulated  impact  envelope  in  the  merged 
probability  density  function  iscommensurately  decreased.  Figure  6  and  Figure  7  show  that 
the  i  mpact  poi  nts  are  much  I  ess  di  spersed  i  n  the  no  fai  I  u  re  case  than  i  n  the  zero  def  I  ecti  on 
case. 

The  bandwidths  used  to  generate  thekernel  density  estimates  shown  inFigure6and  Figure? 
are  given  inTable4. 

Table  4:  Bandwidths  used  to  generate  kernel  density  estimates  in  no  failure  and  zero  deflection  cases 


Case 

Bandwidth  (x) 

Bandwidth  (y) 

No  Failure  Case  (Figure  6) 

881.6 

668.7 

Zero  Deflection  Case  (Figure  7) 

2922.6 

2294.3 

TheevidencefromFigure6and  Figure7suggeststhatthelocation  of  thehigh  density  regions 
i  n  fai  I  ure  mode  cases  rel  ati  veto  the  i  mpact  envel  ope  i  s  not  the  pri  mary  factor  i  n  determi  ni  ng 
the  amount  of  probability  density  assigned  to  regions  outside  the  impact  envelope.  The 
evidence  suggests  that  the  values  of  the  bandwidths  have  a  much  greater  influence  on  this. 
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From  the  discussion  in  this  section,  itisdear  that  there  are  many  factors  to  be  considered  in 
generating  an  accuratekernei  density  estimate.  There  can  be  no  "biind"  process  for  producing 
a  kernei  density  estimate  that  is  appropriate  for  any  dataset  that  may  arise,  it  is  necessary  to 
ensure  that  appropriate  statisticai  inputs  have  been  used,  and  that  the  resuits  obtained  are 
consistent  both  with  the  data  and  with  what  might  reasonabiy  be  expected  in  reaiity. 
Therefore,  any  procedure  for  generating  a  kernei  density  estimateshouid  be  reviewed  by  a 
panei  of  experts,  inciuding  both  statisticians  and  weapons  experts. 

2.3  C  reati ng  overal I  PD  Fs  for  a  given  scenario 

2.3.1  Generating  overaii  PDFsfrom  individuai  faiiure  mode  PDFs 

in  the  previous  sections  we  described  the  process  of  generating  a  numericai  probabiiity 
density  function  from  a  given  dataset.  For  thepurposes  of  theRSlT  it  is  necessary  to  generate 
probabiiity  density  functions  for  a  given  scenario.  Recaii  that  a  scenario  is  made  up  of 
i  nformati  on  coveri  ng  ai  i  known  fai  i  ure  modes,  together  with  the"no  fai  i  ure"  case,  for  a  given 
set  of  other  i  nput  parameters. 

For  the  i  i  i  ustrati  on  of  thetechni  que  of  generati  ng  a  density  esti  mate  for  a  parti  cui  ar  scenari  o 
by  an  appropriate  combi  nation  ofthedensity  esti  matesfortheindividuai  faiiuremodes,TRC 
Mathematicai  Modeiiing  has  incorporated  oniytwofaiiure  modes.  Thatis,  no  faiiure  (faiiure 
modeO)  and  zero  defiection  (faiiure  model).  The  reason  for  this  is  that  there  was  oniy  one 
scenari  o  for  w  h i  ch  d ata  correspond  i  ng  to  mu i  ti  pi  e fai  i  u  re  mod es  w  ere  avai  i  abi  e  (namei  y,  the 
scenario  used  throughoutthis  report).  Therefore,  there  was  no  other  scenario  that  couid  have 
been  used  to  iiiustrate  the  techniques  proposed  in  this  report.  The  data  avaiiabiefor  this 
scenari  o  covered  on  i  y  fai  i  u  re  mod  es  0  and  1.  There  w  as  a  d  ataset  provi  d  ed  that  correspond  ed 
to  a  further  faiiure  mode  (singie  actuator  freeze  -  faiiure  mode  2),  but  this  dataset  did  not 
correspond  to  the  same  scenario  as  the  data  for  faiiure  modes  0  and  1.  in  theory,  it  is 
straightforward  to  i  ncorporate  any  number  of  faiiure  modes  with  this  technique,  provided 
that  an  expert  panei  of  some  form  is  abieto  provide  information  on  the  probabiiity  of  each 
faiiure  mode. 

Denote  by  p;  the  probabiiity  of  faiiure  mode/,  z  =  1, N,  and  ietpo  bethe  probabiiity  that  no 
faiiure  occurs.  Then, 


Po  =  l-Si>o  Pi. 

Let  PD  F,s  denote  the  probabiiity  density  function  for  faiiure  mode/,  /  =  0,  ...  N,  of  a  given 
seen  ar  i  0  S ,  w  h  ere  N  i  s  th  e  n  u  mber  of  f  ai  i  u  re  mod  es.  PDFP  i  s  a  matr  i  X  w  h  ose  (x,yy^^  ement  i  s 
the  probabi  i  ity  of  missi  ie  i  mpact  i  n  gri  d  square  (x,y)  given  fai  i  ure  mode/  of  scenari  o  S  occurs. 
The  probabiiity  density  function  for  a  given  scenario,  FDPs,  is  given  by 

PDFs=Sj  piPDFiS. 

As  an  exampie,  for  the  data  examined  in  Section  2.2.2,  suppose  we  have: 


21 


DSrO-TR-2292 


Failure  Model:  Fins  locking  to  zero  deflection,  probability  pi  =0.0001. 

Probabi  I  ity  of  no  fai  I  ure,  po  =  1  -  pi  =  1  -  0.0001  =  0.9999. 

Combining  the  kernel  smoothing  generated  probability  density  functions  for  these  two 
modes,  according  to  their  respective  probabilities  of  occurrence,  we  obtain  the  probability 
density  function  shown  in  Figure  10,  below. 


LoglO  or  Kernel  Density  Estimate 
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Figure  10:  Overall  PDF  for  scenario  in  Table  2 

The  interior  of  the  combined  PDF  isvery  similar  to  that  of  thenofailurePDF  (Figure6).This 
is  clearly  a  sensible  result  since  this  case  contributes  the  vast  majority  of  the  probability 
weighting.  Flow  ever,  itcan  beseen  from  Figure  lOthatthetailsofthecombined  PDF  arevery 
similar  to  the  zero  deflection  PDF  (Figure  7).  This  is  because  the  tail  decays  much  more 
rapidly  in  the  no  fai  I  ure  case,  asdiscussed  earlier.Thus,  inthetail  regionsthezero  deflection 
PDF  has  much  greater  density,  and  hence  contributes  more  to  the  weighted  average,  even 
though  it  is  weighted  by  a  probability  of  only  lO^. 

2.3.2  Generating  overall  PDFsfrom  individual  fai  lure  mode  PDFs  and  using  the 
Maximum  Energy  Boundary 

The  probabi  I  ity  densi  ty  functi  ons  generated  i  n  the  previ  ous  secti  ons  were  created  by  mi  xi  ng 
2-dimensional  Gaussian  distributions  (si nee  the  kernel  function  chosen  was  the  bivariate 
normal  density).  Recall  that  the  general  idea  is  to  take  each  individual  impact  point  and 
"spread"  it  over  a  wider  area,  with  greatest  concentration  atthei  mpact  poi  nt  itself.  FI  owever, 
the  normal  density  has  non-zero  probabi  I  ity  at  any  poi  nt  (x,y).  Therefore,  this  results  i  n  a  non¬ 
zero  probabi  I  i  ty  d  ensi  ty  even  at  gri  d  squ  ares  h  uge  d  i  stances  from  the  i  mpact  poi  nt  i  tsel  f.  Thi  s 
is  unrealistic  in  practice. 

Weapons  Systems  Division  of  DSTO  has  models  for  generating  a  "Maximum  Energy 
Boundary"  (M  EB)  for  gi  ven  test  scenari  os.  The  M  axi  mum  Energy  Boundary  is  the  maxi  mum 
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distance,  in  any  given  direction,  that  a  missile  might  travel  given  various  factors,  including, 
for  example,  fuel  load  of  the  missile. 

Clearly,  no  impact  point  should  lie  outsidethe  maximum  energy  boundary.  However,  the 
models  used  to  generate  both  the  i  mpact  poi  nts  and  the  maxi  mum  energy  boundary  may  not 
be  perfect.  Therefore,  in  the  event  that  one  or  more  impact  points  lie  beyond  the  maximum 
energy  boundary,  itwould  be  desirable  for  the  software  to  beableto  detect  and  report  this. 
This  has  not  been  implemented  in  the  current  analysis. 

If  weare  provided  with  an  MEBfor  agiven  missileand  test  scenario,  wecan  combine  this 
i  nformati  on  w  i  th  the  overal  I  PD  F  generated  i  n  the  previ  ous  secti  on  to  form  a  potenti  al  I  y  more 
realistic  probability  density  function. 

Denote  by  MEBPDFs  theM  EB  constrained  PDF  for  scenario  S. 

Denote  by  MEB^  the  indicator  matrix  defining  the  M  EB  over  the  same  grid  layout  used  to 
definePDPs.  The(x,y)  co-ordinateof  MEPs  is  i  jf  gnd  square(x,y)  iswithin  theMEB  and  0 
otherwise. 

Then  the  total  probability  massofPDPs  that  falls  within  theMEB  is  given  by 


Pmeb  =Sx  Sy  PDFs(x,y) .  M  EBs(x,y), 


and 


MEBPDF\x,y) 


PDF\x,y) 

Pmeb  'F  MFB\x,y)  =  \, 


0  otherwise. 


Figure  11  shows  an  artificial  MEBand  Figurel2showsthecorrespondingMEBPDF  based  on 
PDF  shown  in  Figure  10.  The  software  used  to  generate  the  M  EBP DF  shown  in  Figure  12 
al  lows  the  user  to  specify  an  appropriate  maximum  energy  boundary.  The  form  of  the  M  EB 
input  in  the  function  is  of  a  grid  of  Os  and  Is,  wherea  lind  i  cates  that  a  grid  square  is  within 
the  maximum  energy  boundary,  and  a  0  indicates  otherwise.  Therefore,  in  order  to  apply  a 
realistic  maximum  energy  boundary,  it  is  necessary  to  generate  such  a  maximum  energy 
boundary  and  convert  it  to  the  appropri  ate  form  for  the  i  nput  to  the  functi  on. 
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Figure  11:  Artificial  maximum  energy  boundary. 
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Figure  12:  Overall  MEBPDF  for  scenario  in  Table  2 


2.4  Putting  it  all  together 

In  this  section  we  bring  together  the  results  developed  in  previous  sections  and  provide 
algorithms  for  computing,  for  a  given  scenario: 

•  The  expected  number  of  casualties. 

•  An  exclusion  zonefor  agiven  exclusion  probability  level. 

•  The  probability  of  a  missile  going  outside  a  given  area. 
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2.4.1  Predicting  the  number  of  people  exposed  to  risk  of  injury 

We  are  now  i  n  a  position  to  esti  mate  the  expected  number  of  people  exposed  to  risk  of  i  njury 
for  a  given  test  scenario. 

Denote  by  POP  the  matrix  defi  ni  ng  the  popul  ati  on  density  over  thesame  gri  d  I  ayout  used  to 
define  PDFS.  The  (x,y)  co-ordi  nate  of  POPs  isthe  expected  number  of  peoplein  grid  square 

(x,y). 


Let  A  be  the  relative  size  of  the  impact  area  of  the  missile  compared  to  the  size  of  a  grid 
square. 

A  ssu  mi  ng  that  the  i  mpact  poi  nt  of  a  mi  ssi  I  e  i  s  w  i  th  i  n  a  gri  d  squ  are  and  that  the  i  mpact  area  i  s 
al  w  ays  f  u  1 1  y  contai  ned  w  i  th  i  n  th  at  g  r  i  d  squ  are,  th  e  expected  numberofpeople  exposed  to  r  i  sk 
of  injury,  E[l],  is  given  by 

E[l  ]  =A  Sx  Sy  PDFs(x,y) .  POP(x,y),  or 
E[l]  =A  Sx  Sy  MEBPDFs(x,y) .  POP(x,y), 

depending  on  whether  an  MEB  is  available,  and  A  is  the  fraction  of  the  grid  square  that  is 
affected  when  a  missile  impacts. 

Wenoted  earlier  that  it  may  not  be  necessary  to  accurately  esti  mate  the  probability  density 
function  central  to  the  impact  envelope.  When  computing  expected  injury  rates  an  accurate 
esti  mate  for  the  enti  re  regi  on  shoul  d  be  used .  FI  ow  ever,  we  have  pragmati  cal  I  y  assumed  that 
a  standard  operational  practice  will  be  to  clear  people  from  the  central  impact  region  and 
hence  any  errors  that  mi  ght  be  i  ntrod  uced  d  ueto  the  I  ess  accurate  i  nteri  or  probabi  I  i  ty  density 
estimates  will  minimal. 

Figure  13  shows  an  artificial  population  density  map  on  thesamegrid  as  used  to  compute  the 
probability  density  function  developed  in  Section  2.3.  As  with  the  maximum  energy 
boundary,  thesoftware used  to  computetheexpected  number  of  casualtiesallowstheuser  to 
input  an  appropriate  population  density.  It  should  be  noted,  however,  that  the  input 
population  density  must  be  on  thesamegrid  asthekernel  density  esti  mate.  Alternatively,  it 
may  be  simpler  to  deliberately  generate  the  kernel  density  esti  mate  on  thesamegrid  as  the 
population  density  function  available. 
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Artificial  Population  Density 


Distance  from  rrassile  launcti  (km) 


Figure  13:  Artificial  population  density 

With  a  nominal  value  of  A  of  0.1,  in  this  example  the  expected  number  of  casualties  is 
6.150934xiaio  if  the  PDF  is  used  and  is  6. 117528xiaio  if  M  EBPDF  is  used. 

2.4.2  Creating  an  exclusion  zone 

Of  potential  interest  is  the  idea  of  determining  a  convex  boundary  around  the  missile  range 
such  that  the  probability  of  a  missile  impacting  outside  the  boundary  is  less  than  some  pre¬ 
determined  level,  for  example,  10"^  This  boundary  defines  a  potential  "exclusion  zone"  for 
clearance  of  personnel  and  /  or  members  of  the  public. 

One  method  for  creating  such  an  exclusion  zone  with  probability  level  s  isthefollowing: 

1.  Sort  PDFs  from  highest  to  lowest  probability  density  across  all  grid  squares. 

2.  Sumthesorted  I  ist  from  highestto  lowest  probability  density,  stopping  when  thetotal 
probabi  I  i  ty  w  i  thi  n  the  excl  usi  on  zone  i  s  greater  than  1-sd  Store  the  I  i  st  of  gri  d  squares 
used  in  the  sum. 

3.  Create  a  convex  hull  around  the  grid  squares  used  in  the  sum  of  Step  2. 

FigureMshowsa  convex  10®  exclusion  zone  created  around  theprobabi  I  ity  density  function 
developed  in  Section  2.3.  The  points  plotted  in  Figure  14  represent  the  grid  squares  included 
in  the  raw  exclusion  zone.  Figure  Malso  shows  the  convex  hull  around  this  exclusion  zone. 
Now,  the  raw  exclusion  zone,  illustrated  by  the  points  shown  in  Figure  14,  contains  a  total 
probability  of  at  least  1  -  10®.  Therefore,  the  convex  hull  shown  in  Figure  14  is  clearly  a 
conservative  10®  exclusion  zone.  It  is  possible  to  obtain  a  less  conservative  exclusion  zone 
using,  one  of  a  number  of  different  methods,  but  this  would  be  considerably  more 
compi  i  cated  and  more  computati  onal  I  y  i  ntensi  ve.  A  dd  i  ti  onal  I  y ,  i  t  may  not  be  necessary  i  n  thi  s 


1  Correct  when  the  total  probability  across  the  PDF  <=  1.  N  ot  correct  when  the  PDF  has  been  over 
estimated. 
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application,  since  the  probability  added  by  taking  the  convex  hull  of  the  raw  exclusion  zone 
may  be  very  smal  I .  This  is  a  possi  bi  I  ity  that  may  be  worthy  of  further  i  nvestigati  on. 


Conservative  Convex  Exclusion  Zone 


0  10000  20000  30000 

Distance  from  missiia  launch  (m) 


Figure  14:  Convex  10-^  exclusion  zone  for  scenario  described  in  Table  2 

2.4.3  Computing  probability  of  leaving  the  range 

Another  possibility  of  potential  interest  is  the  idea  of  determining  the  probability  that  a 
mi  ssi  I  e  I  eaves  a  fi  ri  ng  range.  This  can  be  computed  i  n  a  strai  ghtforward  fashi  on  from  the  data 
generated  above. 

Denote  by  R  the  i  ndi  cator  matrix  defi  ni  ng  thefi  ri  ng  range  over  the  same  gri  d  I  ayout  used  to 
definePDFs.The(x,y)  co-ordinate  of  R  isOif  grid  square(x,y)  is  within  thefi  ring  range  and  1 
otherwise. 

Then  the  total  probability  mass  of  PDRs  that  falls  outside  the  firing  range,  Pr  ,  is  given  by 

Pr=Sx  Sy  PDFs(x,y) .  R(x,y). 

If  the  Maximum  Energy  Boundary  is  known,  then  Pr  can  be  computed  using 

Pr  =Sx  Sy  M  EBPDFs(x,y) .  R(x,y). 

Figure  15  shows  a  firing  range  boundary  overlaid  on  theMEB  probability  density  function 
developed  in  Section  2.3.  In  this  example,  the  probability  of  the  missile  leaving  the  range  is 
1.954699x10'^. 
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Figure  15:  MEBPDF  for  scenario  in  Table  2  showing  boundary  of  an  artificial  firing  range 


2.5  Symmetric  Scenarios 

Generating  a  database  of  impact-point  data  sets  for  all  scenarios  of  interest  can  be  an 
expensive  operation.  One  idea  that  may  make  it  simpler  to  populate  such  a  database  is  the 
idea  of  symmetric  scenarios,  that  is,  whether  symmetric  initial  conditions  may  produce 
symmetric  impact  densities.  If  this  were  the  case,  then  it  may  be  possibleto  obtain  density 
esti  mates  for  multi  pi  escenari  os  by  computi  ng  a  si  ngl  e  kernel  density  esti  mate  for  a  parti  cul  ar 
scenario  and  then  using  reflectionsof  this  density  for  other  scenarios. 

In  order  to  test  this  hypothesis,  kernel  density  estimates  have  been  computed  for  two 
scenarios  with  symmetric  initial  conditions,  and  a  measure  of  the  difference  between  these 
kernel  density  estimates  has  been  calculated,  where  one  density  was  reflected.  These  kernel 
density  esti  mates  were  based  on  sampi  es  of  20,  CXX)  observations.  Themeasureof  thedifference 
between  thedensitiesisthesame  as  that  described  in  Section  2.2.3.  If  the  hypothesis  is  true, 
then  when  one  kernel  density  estimate  is  reflected,  the  kernel  density  estimates  should  be 
similar,  and  themeasureof  the  difference  between  them  should  be  small.  In  order  to  be  able 
to  assess  the  value  of  the  difference  that  should  be  expected  for  two  'similar'  densities,  20 
pai  rs  of  d  i  sjoi  nt  sampI  es  from  the  same  scenari  o  w  ere  generated ,  and  the  d  i  fferences  betw  een 
the  kernel  density  estimates  for  each  pair  were  calculated.  These  samples  also  contained 
20,  (XX)  observations.  From  this  sample  of  20  differences,  a  mean,  variance  and  99%  confidence 
interval  havebeen  calculated. Thevalueof  thedifference  between  thekernel  density  estimates 
for  the  sy  mmetr  i  c  scenari  os  w  as  then  used  to  cal  cu  I  ate  a  quanti  ty  know  n  as  thep-value .  I  n  thi  s 
case,  the  interpretation  of  the  p-value  is  that  it  gives  the  probability  that  the  two  kernel 
density  estimates  are  exactly  symmetric.  If  the  p-value  is  large  enough,  then  it  may  be 
reasonable  to  use  reflected  kernel  density  esti  mates  to  esti  mate  the  densities  for  symmetric 
seen  arios.Flowever,ifthep-valueisvery  smal  I ,  th  en  th  i  s  tech  n  i  q  u  e  may  n  ot  be  ap  prop  r  i  ate. 
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It  must  be  noted,  however,  that  this  investigation  has  been  carried  out  for  only  one  pair  of 
symmetric  scenarios,  so  it  may  not  be  reasonable  to  extrapolate  these  results  to  apply  to  all 
scenari  os.  1 1  must  al so  be  noted  that  the  scenari  os  used  i  n  thi s  i  nvesti  gati  on  are symmetri  c  i  n 
thex-axis.  In  order  to  be  ableto  actually  apply  these  ideas,  it  would  be  necessary  to  conduct  a 
much  moreextensi  vei  nvesti  gati  on.  An  expert  in  thesystem  under  test,  for  examplethesenior 
si  mul  ati  on  model  engi  neer,  may  al  so  be  abl  e  to  determi  new  hether  i  t  i  s  appropri  ate  to  appi  y 
these  ideas  to  the  ground  impact  data  points  based  on  their  knowledge  of  the  system's 
behaviour. 

Thefol  I  ow  i  ng  f  i  gu  res  provi  de  a  vi  sual  i  nd  i  cati  on  of  the  extent  of  the  symmetry  of  the  i  mpact 
distribution  between  the  two  scenarios.  Figure  16  and  Figure  17  show  scatterplotsfor  two 
symmetric  scenarios,  each  with  20,000  observations.  Si  nee  there  were  50,000  observations 
avai  I  abl  efor  thesecond  symmetri  c  scenari  o,  Fi  gu  re  17  si  mpl  y  shows  a  typi  cal  subset  of  20,000 
observations.  Figure  18  shows  a  combined  scatterplot  of  both  scenarios,  where  the  dataset 
correspond  i  ng  to  the  second  symmetri  c  scenari  o  has  been  refi  ected .  Wesee  i  n  that  fi  gurethat 
the "  hook"  regi  ons  al  i  gn  very  cl  osel  y ,  but  vi  su  al  I  y  there  appears  to  be  a  si  gn  i  f  i  cant  d  i  ff  erence 
between  the  two  scenari  os  across  the  rest  of  the  i  mpact  regi  on.  This  suggests  that  either  the 
data  were  generated  differently  for  points  outside  of  the  "hook"  or  that  the  impact 
distributions  are  in  fact  different.  Figure  19  and  Figure  20  show  kernel  density  estimates  for 
the  symmetric  scenari  os,  wherethedensity  for  thesecond  scenario  has  again  been  reflected. 
Thefiguresshow  thattheground  impact  points  are  not  symmetric,  despite  the  symmetry  of 
the  scenari  os. 
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Figure  16:  Scatterplot  for  first  symmetric  scenario 
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Impact  Point  Distribution 
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Figure  17:  Scatterplot  for  second  symmetric  scenario 
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Figure  18:  Scatterplot  showing  both  symmetric  scenarios  (where  one  dataset  is  reflected) 
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Logl  0  or  Kernel  Density  Estimate 


«-2 

■ 

■ 

■ 

<-5 

e 

<6 

■ 

<7 

■ 

<.$ 

■ 

<•9 

■ 

<.10 

D 

<-11 

D 

<•12 

□ 

«13 

c.ia 

<•15 

<.16 

<•17 

<•18 

<-19 

a 

<-20 

R  - 


0  10  20 
Distance  Irom  rrasstle  launch  (km) 


Figure  19:  Kernel  density  estimate  for  first  symmetric  dataset 
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Figure  20:  Reflected  kernel  density  estimate  for  second  symmetric  dataset 
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3.  I  nvestigation  of  A  ppropriate  Size  of  D  atasets  and 
Resolution  of  Kernel  Density  Estimates 

3.1  Proced  u  re  of  I  n vesti  gati  on 

Thesecondsetoftasksundertaken  byCDCIN  fortheRSTT  project  included  an  investigation 
ofthemostappropriatenumber  of  observations  and  resolution  of  the  KDEs  (Kernel  Density 
Esti  mates)  to  be  generated . 

Due  to  the  amount  of  time  required  to  generate  each  data  point,  and  the  vast  number  of 
datasets  to  be  produced,  it  was  deemed  i  mportant  to  i  n  vesti  gate  the  rel  ati  onshi  p  between  the 
number  of  observations  used  and  the  quality  of  the  KDE  produced. 

A  second  factor  affecting  the  quality  of  the  KDE  \5  the  resolution  of  theKDE. Throughout  this 
document,  the  term  resolwf/on  will  refertothenumber  of  interval  s  into  which  thex-  and  y- 
d  i  mensi  ons  of  the  i  mpact  d  i  stri  buti  on  are  d  i  vi  ded .  For  exampi  e,  for  a  K  D  E  w  i  th  a  resol  uti  on  of 
16,  the  area  of  the  i  mpact  d  i  stri  buti  on  w  ou  I  d  be  d  i  vi  ded  i  nto  a  16x  16  rectangu  I  ar  gri  d ,  and  the 
KDE  process  esti  mates  the  probabi  I  ity  mass  of  the  i  mpact  di  stri  buti  on  that  i  s  contai  ned  with!  n 
each  rectangle.  In  order  to  determinethe  most  appropriate  number  of  observations,  itwasalso 
necessary  to  invest!  gate  the  relationship  between  the  resol  uti  on  and  the  accuracy  of  the  KDE. 
We  test  the  changing  accuracy  of  a  series  of  KDEs  of  a  given  scenario  by  examining  the 
d  i fference  betw een  an  esti  mated  K  D  E  and  the " overal  I  K  D E "  of  the  i  mpact  d i  stri  buti  on .  1 1  w as 
also  necessary  to  invest! gate  the  relationship  between  the  number  of  observations  and  the 
resol  uti  on,  si  nee,  for  exampi  e,  a  lOOOx  1000  KDE  based  on  onl  y  10  observati  ons  would  provi  de 
a  misleading  amount  of  detail,  and  thismay  affect  the  accuracy  of  theKDE. 

In  the  foil  owing,  we  take  as  the  overal  I  KDE  theaverageof  20  independent  KDEs,  based  on 
50,000  observations  each.  Therefore,  the  overall  KDE  is  based  on  a  total  of  1,000,000 
observati  ons.  The  reason  that  we  have  used  an  average20  KDEs  based  on  50,000 observati  ons 
each,  rather  than  a  single  KDE  based  on  all  1,000,000  observati  ons,  is  that  computer  memory 
constraints  did  not  allow  1,000,000  observations  to  be  processed  simultaneously.  The 
measures  of  difference  will  be  referred  to  as  the  Mean  Log  Scaled  (MLS)  difference,  the 
Exclusion  Zone  (EZ)  difference,  and  the  Total  Log  difference.  Each  of  these  functions 
implements  a  different  way  of  measuring  the  difference  between  two  probability  mass 
functions  defined  over  the  same  range,  each  highlighting  slightly  different  aspects  of  this 
difference. 

The  data  used  in  this  investigation  was  supplied  by  DSTO  from  the  data-set  "fault-set  all 
actuators  zeroed  GGM  012  1  -  1,000,000",  on  23  September  2005.  All  figures  in  this  section, 
except  for  Figure  28,  were  generated  by  partitioning  thedata  into  subsets  of  a  particular  size, 
and  cal  cul  ati  ng  the  average  d  ifference  of  the  su bsets  from  the  overal  I  K D E,  for  vari  ous  si zes 
and  resol  uti  ons.  Therefore,  all  figures,  except  for  Figure  28,  are  based  on  the  entire  dataset  in 
the  file  mentioned  above.  Figure  28  is  based  on  a  particular  subset  of  100  observati  ons  from 
thisfile. 
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3.2  Results  and  Discussion 

Thefirst  measure  of  difference  calculated  wastheMean  Log  Scaled  (M  LS)  difference.  TheM  LS 
difference  for  two  KDEs  was  calculated  by  the  foil  owing  formula: 


3  Z 


^  all  grid  squares 


(P  test  P  overall  ) 

P overall  P overall  ) 


whereptesf  is  the  probabi  I  ity  mass  for  the  KDE  bei  ng  tested,  for  a  parti  cul  ar  grid  square,  poveraii 
i  s  the  correspond  i  ng  probabi  I  i  ty  mass  for  the  overal  IKDE,andnisthe  resol  uti  on  of  the  K  D  Es. 

That  is,  for  each  grid  square,  take  the  square  of  the  difference  between  the  two  probability 
masses,  divideby  a  scaled  version  of  the 'correct'  KDE,  take  the  log  (base  10).  Finally,  take  the 
averageof  this  quantity  over  all  grid  squares. 

The  scaling  factor  has  been  chosen  such  that  an  overall  probability  of  p  is  treated 
symmetrically  to  an  overall  probability  of  1-p.,  and  the  log  has  been  taken  in  order  to 
emphasise  larger  differences. 

A  typical  plot  of  the  average  "error"  of  a  test  KDE  from  the  overall  KDE  against  the 
resolution,  for  a  fixed  number  of  observations  is  shown  in  Figure  21. 


Trend  in  'MLS  error'  of  KDE  with  std  errors 


3  4  5  6  7  8 


Log  resolution  (base  2) 

Observations:  1000. 

Figure  21:  Average  Mean  Log  Scaled  error  vs  resolution  for  1000  observations 


Figure  21  indicates  that  as  the  resolution  is  increased,  the  "error"  of  the  KDE  initially 
decreases  by  a  small  amount,  but  for  a  resol  uti  on  greater  than  32  (Son  thex-axisofthefigure 
as  i  t  i  s  show  n  as  the  I  og  to  base  2),  the  i  mprovement  i  s  negl  i  gi  bl  e.  A  second  poi  nt  i  1 1  ustrated 
by  Figure  21  is  that  as  the  resolution  is  increased,  the  standard  deviation  of  the  "errors" 


33 


DSrO-TR-2292 


initially  increases  by  a  small  amount.  However,  again,  for  a  resolution  greater  than  32,  the 
difference  in  standard  deviation  is  negligible.  For  these  reasons,  it  was  decided  that 
resolutions  of  16  and  32  were  of  greatest  interest,  and  for  the  remaining  measures  of 
difference,  plots  were  only  generated  for  the  average  error  against  the  number  of 
observations,  for  fixed  resolutions  of  16  and  32.Theseplotsareshown  in  Figure  22-  Figure  26. 


Trend  in  'MLS  error'  of  KDE  with  std  errors 


500  1000  1500  2000 


Number  of  Obseivations 
Resolution:  16. 

Figure  22:  Average  Mean  Log  Scaled  error  vs  number  of  observations  for  a  resolution  of  16 


Trend  in  'Totai  Log  error'  of  KDE 


500  1000  1500  2000 


Number  of  ObseiYations 
Resolution:  16. 

Figure  23:  Average  Total  Log  error  vs  number  of  observations  for  a  resolution  of  16 
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Trend  in  'IVILS  error'  of  KDE  with  std  errors 


500  1000  1500  2000 


Number  of  Observations 
Resolution:  32. 

Figure  24:  Average  Mean  Log  Scaled  error  vs  number  of  observations  for  a  resolution  of  32 


Trend  in  'Totai  Log  error'  of  KDE 


500  1000  1500  2000 


Number  of  Observations 
Resolution:  32. 

Figure  25:  Average  Total  Log  error  vs  number  of  observations  for  a  resolution  of  32 
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Trend  in  'IVILS  error'  of  KDE  with  std  errors 
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Figure  26:  Average  Mean  Log  Scaled  error  vs  number  of  observations  for  a  resolution  of  32 


The  most  obvi  ous  observati  on  from  Fi  gure  22  -  Fi  gure  26  i  s  that  as  the  number  of  observati  ons 
increases,  it  appears  that  the  "error"  of  the  KDE  from  the  overall  KDE  will  continue  to 
decrease  indefinitely.  Therefore,  in  the  feasible  range  of  the  number  of  observations,  there 
appears  to  be  no  threshold  above  which  a  "perfect"  KDE  is  obtained,  for  which  no  further 
improvement  can  be  gained.  Similar  analysis  during  Stage  1  of  the  RSTT  project  also 
suggested  that  no  such  threshold  exists  up  to  35,000  observations. 

The  second  poi  nt  to  observe  from  Fi  gure  22  -  Figure  26  is  that  as  the  number  of  observati  ons 
i  ncreases,  the  standard  error  also  appears  to  decrease.  As  with  the  average  error,  this  trend 
also  appears  to  conti  nuei  ndefi  nitely,  ascan  be  observed  from  the  standard  error  barsfor  5000 
and  10,000  observati  ons,  in  Figure  26,  above. 

The  final  measure  of  difference  calculated  was  the  exclusion  zone  difference.  Figure  30  and 
Figure  31,  below,  show  the  average  errors  with  standard  error  bars,  for  the  Exclusion  Zone 
difference.  The  Exclusion  Zonedifferenceismadeup  of  two  components.  These  components 
are  plotted  in  green  and  red  in  Figure  30  and  Figure 31,  below.  The  green  components 
represent  the  percentage  of  gri  d  squares  that  were  conservatively  i  ncl  uded  i  n  the  excl  usion 
zonefor  the  test  KDE  and  the  red  components  represent  the  percentage  of  grid  squares  that 
were  incorrectly  omitted  from  the  exclusion  zonefor  the  test  KDE.  The  black  components 
rep  resent  the  total  Exclusion  Zoneerror,  which  isthesum  of  the  red  and  green  components. 
Note  that  each  of  these  figures  is  given  as  a  percentage  of  all  grid  squares  included  in  either 
the  given  test  KDE,  the  overall  KDE  or  both. 

Figure  27  and  Figure  28,  below,  show  exclusion  zones  for  the  overall  KDE  and  a  test  KDE 
based  on  100  observations,  with  a  resolution  of  16.  Figure  29  shows  the  components  of  the 
exclusion  zone  difference  for  these  two  exclusion  zones.  For  this  particular  test  KDE,  the 
percentage  of  grid  squares  that  wereconservatively  included  in  the  excl  usion  zonefor  thetest 
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KDE  (that  is,  the  green  component)  is  equal  to  14.06%,  and  the  percentage  of  grid  squares  that 
were  incorrectly  omitted  from  the  overall  KDE  (that  is,  the  red  component)  is  equal  to  16.68%. 
Therefore,  the  total  excl  usion  zone  error  between  thetwo  exclusion  zones  shown  in  Figure  27 
and  Figure  28  is  equal  to  30.74%. 


Exclusion  Zone 


^ - 1 - 1 - 1 - r 

-10  0  10  20  30 

Distance  from  missile  iaunch  (km) 

Figure  27:  Overall  exclusion  zone 


Exclusion  Zone 


Figure  28:  Exclusion  zone  for  a  KDE  with  100  observations  and  a  resolution  of  16 
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Exclusion  Zone  Difference 
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Distance  from  missile  launch  (km) 

Figure  29:  Components  of  the  exclusion  zone  difference  for  the  two  exclusion  zones  shown  above 


Trend  in  'EZ  error'  of  KDE 


Figure  30:  Average  Exclusion  Zone  error  vs  number  of  observations  for  a  resolution  of  16 
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Trend  in  'EZ  error'  of  KDE 


Figure  31:  Average  Exclusion  Zone  error  vs  number  of  observations  for  a  resolution  of  32 

I  n  CO  ntrast  withtheearlierfigures,  Figure30andFigure31indi  cate  th  at  th  e  i  mp  rovement  i  n 
the  accuracy  of  the  excl  usion  zone  di  mi  nishes  as  the  number  of  observations  i  ncreases. 

The  Exclusion  Zone  difference  also  appears  to  have  a  number  of  other  useful  qualities.  It 
seems  to  have  the  smallest  standard  error  of  all  the  measures  of  difference  calculated,  and  it 
also  seems  to  give  the  most  consistent  and  smooth  trend  as  the  number  of  observations 
increases.  Additionally,  it  can  be  seen  that  the  Exclusion  Zone  error  is  composed 
predominantly  of  the  conservative  portion  of  the  error,  plotted  in  green. 

Regarding  the  selection  of  the  number  of  observations,  the  red  parts  of  Figure  30  and 
Figure 31  suggest  that  if  the  number  of  observations  is  at  least  600,  then  the  average 
percentage  of  grid  squaresincorrectlyomittedfromtheexclusionzoneisaround  5%.  Itisalso 
I  i  kel  y  that  thi  s  percentage  w  i  1 1  be  consi  d  erabi  y  red  u  ced  w  hen  the  excl  usionzoneis  converted 
to  a  convex  hull.  With  the  current  software,  it  is  not  possibleto  determine  how  much  this 
percentage  woul  d  be  reduced,  because  it  is  not  possi  bleto  apply  this  measureof  differenceto 
the  convex  version  of  the  excl  usion  zone. 

Overall,  itappearsthatthequality  of  theKDE  generated  continues  to  improveasthenumber 
of  observations  is  increased.  This  suggests  that  as  many  observations  as  possible  should  be 
generated,  since  any  increase  will  result  in  improved  estimates  of  the  KDE.  Flowever,  the 
observed  trends  in  the  Exclusion  Zone  difference  measure  suggest  that  if  any  less  than  600 
observations  are  generated  for  each  scenario,  the  percentage  of  grid  rectangles  incorrectly 
omitted  from  the  excl  usion  may  be  unnecessarily  high,  of  theorder  of  5%.  Consequently,  600 
observations  is  recommended  as  a  lower  bound  on  the  number  of  observations  generated  for 
a  given  scenario. 
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3.3  Recommendations 

We  recommend  thefol lowing: 

1.  That  KDE  resolutions  beyond  16  x  16  and  32  x  32  do  not  provide  significantly  more 
accurate  i  nformation  and  hence  16x  16  or  32  x  32  resol  uti  ons  appear  to  besuitabi  efor 
the  devel  opment  of  Range  Safety  T empi ates. 

2.  That  at  I  east  600  observations  (impact  data  points)  be  used  in  generating  KDEs  for  a 
given  scenario. 

3.  That  a  more  precise  estimate  of  the  average  percentage  of  grid  squares  incorrectly 
omitted  from  the  exclusion  zone  be  obtained. 
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4.  Issues  in  KDE  generation  -  Dynamic  Bandwidth 

Seiection 

The  third  set  of  tasks  undertaken  by  CDCIN  for  the  RSTT  project  included  researching 
problems  adjacent  to  the  PDF  generation  problem.  During  the  course  of  thisthird  set  of  tasks 
however,  it  became  apparentthatfor  input  datasets  with  certain  properties,  the  kernel  density 
estimation  algorithm  proposed  previously  produced  an  inappropriate  kernel  density  estimate. 
An  exampleof  such  a  KDE  isgiven  in  Figure 32,  below. 


Impact  Data 


KDE 


Distance  from  missile  launch  (km) 


Figure  32:  Scatterplot  and  KDE  with  inappropriate  bandwidth  selection 


The  impact  data  comprises  600  points,  with  the  bulk  of  the  points  (approximately  90%) 
concentrated  around  the  point  (8,4). 


Ideally,  the  KDE  for  this  dataset  should  contain  reasonably  high  levelsof  probability  for  the 
majority  of  thearea  within  theconvex  hull  of  the  impact  points.  A  scan  be  seen  intheKDE  in 
Figure  32,  there  are  large  areas  within  theconvex  hull  that  contain  effectively  Oprobability 
mass.  Upon  further  investigation,  it  was  found  that  the  source  of  this  problem  was  the 
bandwidth  selection  algorithm.  More  specifically,  it  was  found  that  ifthedatasetinputto  the 
KDE  algorithm  contained  a  very  dense  cluster  of  points,  the  bandwidths  selected  by  the 
algorithm  were  too  small,  and  the  resulting  KDE  exhibited  features  such  as  those  seen  in 
Figure  32. 


A  possiblesoluti on  to  this  problem  would  beta  generateseparateKDEsforthecluster  and  the 
remai  ni  ng  poi  nts,  then  taki  ng  a  wei  ghted  combi  nati  on  of  thetwo  KDEs.  This  can  beachi  eved 
by  the  procedure: 

1.  Determinea  grid  over  which  the  final  KDE  is  to  degenerated. 

2.  Identify  a  small  setof  grid  squares  containing  high  densities  of  points. 

3.  Generate  a  KDE  for  the  poi  nts  within  the  small  setof  grid  squares. 

4.  Generate  a  separate  KDE  for  the  remai  ni  ng  poi  nts. 

5.  Takea  weighted  combination  of  thetwo  KDEs. 
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Note  that  a  KDE  is  represented  by  a  matrix  where  each  cell  of  the  matrix  contains  the 
esti  mated  probabi  I  ity  mass  for  a  gri  d  squareTherefore,  the  wei  ghted  combi  nati  on  referred  to 
in  Step  5  of  the  above  procedure  can  be  calculated  by  the  foil  owing  formula: 


KDEfinal  — P  ^^DEcluster  "E (1"P)  XKDEscatter 


where  KDEduster  is  the  KDE  based  on  the  points  in  the  chosen  set  of  dense  grid  squares, 
KDEscatier  is  the  KDE  based  on  the  poi  nts  i  n  the  remai  ni  ng  poi  nts,  scattered  outsi  dethe  chosen 
set  of  gri  d  squares,  p  is  the  proportion  of  i  mpact  poi  nts  that  I  i  e  withi  n  the  chosen  set  of  gri  d 
squares. 

Figure  33,  below,  shows  the  KDE  generated  by  this  method  for  the  scenario  shown  in 
F  i  g u  r e  32.  N  ote th at  as  th  i  s  method  u ses  d  i  ff erent  ban d  w  i  d ths  f or  d  i  ff erent  sets  of  poi  nts,  th  i  s 
is  actually  an  exampleof  a  KDE  method  using  dynamically  determined  bandwidths. 


KDE 


Figure  33:  KDE  with  dynamic  bandwidth  selection 

I  n  order  to  develop  a  more  robust  KDE  algorithm,  it  is  necessary  to  ensure  the  selection  of  an 
appropriate  bandwidth,  preferably  via  an  automated  procedure. 

Wecommence  by  examining  theeffect  on  KDE  generation  of  the  existence  of  a  tight  cluster  of 
pointsinthedataset.  We  have  created  data  sets  in  which  between  10%  and  90%  of  the  poi  nts 
are  specifically  allocated  within  onegrid  square  and  the  remainder  areuniformly  distributed 
over  a  larger  region.  We  then  generate  KDEs  for  each  case  and  identify  where  the  KDEs 
become  inappropriate.  Figure  34 to  Figure 42  show  the  difference  between  KDEs  generated 
usi  ng  the  "standard"  static  bandwidth  procedure  and  usi  ng  theprocedureouti  i  ned  i  n  Steps  1- 
5,  above. 
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Log10  of  Kernel  Density  Estimate 
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Figure  34:  Static,  on  the  left,  and  dynamic,  on  the  right,  bandwidth  KDEs  with  a  10%  cluster.  Note 
that  both  plots  are  log  base-10  of  the  estimated  probability  density. 


Log1 0  of  Kernel  Density  Estimate 
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Figure  35:  Static  and  dynamic  bandwidth  KDEs  with  a  20%  cluster.  Note  that  both  plots  are  log  base- 
10  of  the  estimated  probability  density. 


Logic  of  Kernel  Density  Estimate 
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Figure  36:  Static  and  dynamic  bandwidth  KDEs  with  a  30%  cluster.  Note  that  bothplots  are  log  base- 
10  of  the  estimated  probability  density. 
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LoglO  of  Kernel  Density  Estimate 
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Figure  37:  Static  and  dynamic  bandwidth  KDEs  with  a  40%  cluster.  Note  that  bothplots  are  log  base- 
10  of  the  estimated  probability  density. 
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Figure  38:  Static  and  dynamic  bandwidth  KDEs  with  a  50%  cluster.  Note  that  both  plots  are  log  base- 
10  of  the  estimated  probability  density. 
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Figure  39:  Static  and  dynamic  bandwidth  KDEs  with  a  60%  cluster.  Note  that  both  plots  are  log  base- 
10  of  the  estimated  probability  density. 
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Logic  of  Kernel  Density  Estimate 
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Figure  40:  Static  and  dynamic  bandwidth  KDEs  with  a  70%  cluster.  Note  that  bothplots  are  log  base- 
10  of  the  estimated  probability  density. 
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Figure  41:  Static  and  dynamic  bandwidth  KDEs  with  a  80%  cluster.  Note  that  both  plots  are  log  base- 
10  of  the  estimated  probability  density. 
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Figure  42:  Static  and  dynamic  bandwidth  KDEs  with  a  90%  cluster.  Note  that  both  plots  are  log  base- 
10  of  the  estimated  probability  density. 
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Notethatin  all  of  the  plots  shown  in  Figure  34  to  Figure  42,  thecolour  of  thegrid  squarein 
which  the  cluster  occurs  is  somewhat  obscured  by  the  impact  points  plotted  in  this  grid 
square.  Flowever,  in  each  of  these  plots,  the  grid  square  in  which  the  cluster  occurs  will 
contain  a  large  probability  mass,  and  would  therefore  appear  as  a  red  grid  square  in  these 
plots,  if  it  were  not  obscured. 

For  thescenari  os  shown  in  Figure  34  to  Figure42,  with  up  to  50%of  impact  pointsin  asingle 
grid  square,  thestatically  generated  KDE  and  dynamically  generated  KDE  arevery  similar. 
Flowever,  with  70%  in  a  single  grid  square,  the  effective  ranges  of  the  two  KDEs  arevery 
d  i  fferent,  w  hi  I  e  at  80%  there  are  patches  of  effect!  vel  y  0  probabi  I  i  ty  w  i  thi  n  the  convex  hu  1 1  of 
the  data.  This  suggests  that  a  dynamic  bandwidth  KDE  generation  procedurethatsearchesfor 
cl  uster  densiti  es  of,  say,  30%  of  i  mpact  poi  nts  wi  1 1  provi  d  every  si  mi  I  ar  KDEs  for  I  ower  cl  uster 
densities,  and  effectively  "spread"  probability  mass  over  the  wider  impact  area  in  the 
presence  of  higher  impact  densities. 

We  note  that  Figure  34  to  Figure  42  also  highlight  a  potential  limitation  of  the  proposed 
dynamic  bandwidth  procedure.  The  KDEs  generated  by  the  dynamic  bandwidth  procedure 
suggest  an  extremely  tight  boundary  around  the  cluster  area,  and  the  area  immediately 
outsidethe  impact  area  has  a  reasonably  low  i  mpact  probabi  I  ity.  Whilst  this  may  bejustified 
based  on  examination  of  the  numerical  data  purely  in  isolation,  care  must  betaken  in  the 
actual  application  of  the  results  given  potential  uncertainties  in  the  physical  nature  of  the 
process  being  modelled  and  the  method  by  which  the  data  is  being  generated  (numerical 
simulation). 

We  recommend  that  relevant  subject  matter  experts  independently  review  the  KDEs  generated  in  such 
cases  and  provide  advice  on  the  subsequent  construction  of  range  safety  templates,  in  particular  near 
areas  of  dense  concentration  of  impact  points. 

We  recommend  that  if  30%  or  more  of  impact  points  lie  within  a  single  grid  square  that  the  dynamic 
bandwidth  KDE  procedure  described  above  be  employed. 

Weal  SO  notethattheissueidentified  when  poi  nts  aretightly  clustered  also  occurs  for  datasets 
containing  a  dense,  narrow  band  of  points,  as  illustrated  in  thefol lowing  example. 
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Log10  of  Kernel  Density  Estimate 


KDE 


Figure  43:  Static  and  dynamic  bandwidth  KDEs  with  a  concentrated  band  of  impact  points.  Note  that 
both  plots  are  log  base-10  of  the  estimated  probability  density. 

Fi  gure43  shows  KDEs  result!  ng  from  a  dataset  contai  ni  ng  a  dense  narrow  band  of  poi  nts.  The 
static  bandwidth  KDE  demonstrates  a  typical  KDE  based  on  a  dataset  containing  a  dense 
narrow  band  of  points,  and  thedynamic  bandwidth  KDE  demonstrates  how  theKDE  can  be 
modified  by  applying  a  variant  of  the  dynamic  bandwidth  procedure  (details  provided 
below). 

Finally,  if  a  dataset  contains  multiple  clusters  or  bands  of  points,  the  bandwidth  selection 
issues  described  above  do  not  occur,  provided  that  the  clusters  or  bands  are  sufficiently 
spaced  apart.  This  is  demonstrated  by  Figure 44,  below. 


LoglO  of  Kernel  Density  Estimate 
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Figure  44:  Static  bandwidth  KDE  with  a  two  concentrated  bands  of  impact  points 

The  figure  shows  two  high  density  impact  bands,  in  adjacent  "columns"  of  grid  squares. 
There  i  s  suff  i  ci  ent  vari  ati  on  i  n  the  d  ensi  ty  of  i  mpact  poi  nts  i  n  the  x-axi  s  to  ensu  re  a  reasonabi  e 
selection  of  the  bandwidth  parameter,  resulting  in  a  reasonabi  espread  of  theprobability  mass 
over  the  enti  re  i  mpact  area. 
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Asa  consequence  of  the  above  investigation,  a  potential  process  for  dynamic  bandwidth 
selection  that  addresses  the  abovesituation  isas  follows: 

1.  Determine  the  percentage  of  points  in  each  individual  grid  square. 

2.  If  any  of  these  percentages  exceed  a  certain  threshold,  then  separate  the  points  in  the 
densest  grid  square  from  thedataset  and  takea  weighted  combination  of  theKDEsfor 
the  dense  gri  d  square  and  the  remai  ni  ng  poi  nts. 

3.  If  none  of  the  percentages  in  Step  1  exceed  the  threshold,  then  determine  the 
percentage  of  points  in  each  row  and  column  of  grid  squares. 

4.  I  f  any  of  the  percentages  i  n  Step  3  exceed  a  certai  n  threshol  d,  then  separate  the  poi  nts 
i  n  the  most  dense  row  or  col  umn  from  thedataset  and  takea  wei  ghted  combi  nati  on  of 
theKDEsfor  the  dense  row  or  column  and  the  remaining  points. 

5.  If  none  of  the  percentages  in  Step  3  exceed  the  threshold,  then  generate  a  KDE  using 
thestatic  bandwidth  method  developed  previously  in  the  RSTT  project. 
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5.  KDE  Isotropy 

5.1  Observations  from  impact  data 

Fu  rther  i  n  vesti  gati  on  of  the  grow  i  ng  col  I  ecti  on  of  RSTT  i  mpact  data  sets  has  reveal  ed  that  su  b- 
optlmal  kernel  density  estimates  are  produced  In  some  Interesting  cases.  As  shown  In 
Flgure45,  the  two  dimensional  formula  presented  In  section  2.2.1  produces  a  good  kernel 
density  estimate  for  data  randomly  scattered  about  the  Y  axis.  In  this  case,  the  X  and  Y 
band  widths  are  treated  Independently  and  correspond  to  the  diagonal  bandwidth  matrix: 
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Figure  45:  Diagonal  bandwidth  matrix  KDE  for  impacts  randomly  distributed  about  the  X  and  Y  axis 

FI  owever,  I  n  a  number  of  the  data  sets  exami  ned  theground  I  mpacts  appeared  to  be  randomly 
distributed  about  axes  rotated  with  respect  to  the  X  and  Y  axes.  A  representative  ground 
I  mpact  distribution  Isshown  In  Flgure46.  Whenthedlagonal  bandwidth  matrix  Is  applied  In 
this  case,  the  kernel  density  estimate  I  snot  as  optimal  as  the  example  shown  In  Figure  45.  For 
thepurposes  of  thisdiscusslon,  theground  Impacts  shown  In  Flgure46areslmplyaforty-flve 
degree  rotation  of  the  Impacts  presented  In  Flgure45.  The  KDE  In  Figure  46  Is  a  less 
conservative  result  than  the  KDE  shown  In  Figure  45  as  several  I  mpact  points  now  lleinthe 
1x10’^  probability  region. 
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Impact  point  <listril)iitioii 


Figure  46:  Diagonal  bandwidth  matrix  KDEfor  example  ground  impact  distribution 


Thediscovery  of  cases  that  are  cl  early  better  represented  by  bandwidths  defined  about  axes 
rotated  with  respect  to  X  and  Y  has  motivated  some  investigation  into  using  a  non-diagonal 
bandwidth  matrix: 


Defining  the  terms  of  the  covariance  matrix  is  not  a  trivial  task  and  has  received  limited 
treatment  i  n  the  I  iterature.  Recent  work  by  Zhang  et  al .  [5,  6]  outi  i  nes  M  arkov  Chai  n  M  onte 
Carlo  algorithms  for  estimating  the  bandwidth  matrix  parameters.  Duong  etal.  [7, 8]  discuss 
the  application  of  plug-in  algorithms,  biased  cross-validation  and  smooth  cross-validation 
algorithms  for  defining  the  bandwidth  matrix.  With  all  proposed  techniques  the  observed 
performance  must  be  balanced  agai  nst  the  computational  effort  requi  red  for  large  data  sets. 
Based  ontheobserved  cases  illustrated  here,  we  proposeoneapproach  for  deriving  theterms 
ofthefull  bandwidth  matrix  for  guided  weapon  ground  impact  data. 

5.2  I  m pact  coord i  nate  correl ati on 

The  two  dimensional  formulafor  /(x,  j;)  presented  inthesection  2.2.1ismostapplicablein 
cases  w  h  ere  X,  and  Y,  are  sampled  independently.  In  reality,  thecoordinates  of  theimpact  are 
not  independent.  The  correlation  between  the  X  and  Y  coordinates  of  the  sampled  data  is 
included  in  the  kernel  density  estimate  as  follows: 


f{x,y)  =  -iK{x-X^,y-Y„U). 
n 
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whereH  isthefull  bandwidth  matrix: 


The  kernel,  as  represented  in  the  above  equation,  is  now  two-dimensional  and  dependent  on 
thebandwidth  operator  H  .Ideally,  H  should  not  bedegenerate,  which  will  be  the  case  for  real 
2D  impact  data.  However,  in  cases  where  the  determinant  is  small,  the  resolution  error  must 
be  used  to  set  a  minimum  value  of  thebandwidth  elements.  Note  that  this  problem  exists 
regardless  of  what  formula  is  being  used. 

If  the  normal  distribution  represents  the  kernel,  the  estimator  is  the  sum  of  the  bivariate 
normal  distributions: 
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The  matrix  H2  is  the  equivalent  of  the  covariance  matrix  expressed  in  terms  of  bandwidth 
matrix  H .  The  elements  of  the  matrix  H2  can  be  found  using  the  one-dimensional  formula 
along  the  principal  axes  of  the  distribution  and  then  rotating  the  matrix  back  to  the  original 
coordinates.  The  principal  axes  of  the  distribution  are  the  coordinates  in  which  the  XY- 
correlation  is  zero  and  correspond  to  the  eigenvectors  of  the  covariance  matrix  S: 

_  (x,-x)(Y,-Yy 
ix^-x)iy-Y)  iy-Yf  J 


The  covariance  matrix  represented  in  principal  axes  coordinates  is  a  diagonal  matrix 
consisting  of and from  theXY-i  ndependentformula.  Suppose  that  thediagonal  terms  are 
Si  and  S2  (eigenvalues)  and  the  main  eigenvector  is  (sx,sy),  then: 


where 
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U  is  the  rotation  matrix  to  the  principal  axes  with  the  properties  |U|  =  1  and 
=  U(5^  ^  -s^)  .Thebandwidth  calculated  along  the  principal  axescan  betranslated  back 
to  the  ori  gi  nal  coord i  nate  system  usi  ng: 
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N  otethat  only  matrices  S  and  H2  are  rotated,  not  the  coordinates  of  the  impact  points.  The 
elements/ji  and  hi  are  calculated  as  per  the  one-dimensional  case  using  Si  and  Si  values.  If  the 
valueS2  issmall  it  should  be  increased  to  the  resolution  error  factor  (Si  cannot  besmal  I  sinceit 
isthemain  eigenvalue).  A  small  S2  value  corresponds  to  the  casew  here  all  pointslieon  aline 
or  are  very  close  to  one. 

Iftheformul  a  from  the  section  2.2.3.3isused  for  calculating  bandwidth,  then  I QR  should  be 
calculated  along  the  principal  axes  of  S.  This  is  done  by  projecting  each  point  onto  the 
principal  axes  via: 


The  new  Pi  coordinates  are  then  used  to  calculate/ti  and  Q,-  coordinates  to  calculated. 
Figure47showstheKDE  produced  using  the  non-diagonal  bandwidth  matrix  for  the  example 
data  first  presented  in  Figure  46.  TheKDE  appears  to  be  more  consistent  with  the  expected 
result  demonstrated  in  Figure45forthenon-rotated  ground  impact  data. 


y-Mis:  Distance  from  missile  launch  (m) 


Figure  47:  Full  bandwidth  matrix  KDEfor  example  ground  impacts 

It  is  possible  to  predict  a  distribution  thatwill  notbewell  represented  by  a  KDE  generated 
using  the  non-diagonal  bandwidth  matrix  outlined  here,asshownin  Figure48.Thisexample 
is  a  hypothetical  casethat  has  not  been  observed  in  the  aval  I  able  data  sets.  Figure  48  shows 
agai  n  that  cl  usteri  ng  (seesecti  on  4)  can  cause  the  global  approach  to  bandwidth  esti  mation  to 
produce  a  non-optimal  KDE  in  some  cases.  To  date,  there  have  not  been  any  identified  cases 
in  the  data  that  result  in  a  less  optimal  KDE  when  using  the  non-diagonal  bandwidth  matrix 
as  compared  to  using  a  diagonal  bandwidth  matrix. 
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Iinp4ict  point  distiilxition 
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Figure  48:  Data  set  with  non-optimal  KDE  using  a  non-diagonal  bandwidth  matrix 


5.3  Computational  efficiency  and  accuracy 

Calculating  the  grid  values  of  the  estimator  f{x,y)  usingtheaboveformularequiresOfnM^) 
exponents,  whereM  isthegrid  size  and  n  is  number  of  impacts.  It  is  possible,  however,  to 
reduce  the  number  of  calculations  by  factorising  the  exponent  in  the  bivariate  normal 
distribution.  Let  us  denote  the  coordinates  of  fc-th  impact  as  (XfoYJ.  The  bivariate  normal 
distribution  can  then  bewritten  in  a  well  known  covariance  form  as: 
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The  expression  can  be  factorised  into  the  foii  owing  form: 


k=\ 


with 


=  exp 
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This  form  is  equivaient  to  the  originai  covariance  form  (which  can  be  verified  by  direct 
substitution),  but  now  matrices  and  oniy  need  to  be  caicuiated  once  for  aii 
combinations/,;,  and fc.Thiswiii  requireabout 0(2nM>  exponent evaiuations.Thissymmetry 
is  a  consequence  of  the  grid  being  presented  asa  iatticeaiigned  aiong  X  and  Y  axes  and  the 
bivariatenormai  distribution  being  symmetricai  reiati  veto  its  own  principai  axis.  Using  theB 
matrices,  the  exponents  can  now  be  caicuiated  for  0(M^+2nM)  terms,  which  is  i ess  than  the 
0(nlvf)  terms  required  previousiy. 

There  i  s  a  n  u  meri  cai  si  d  e  effect  to  th  i  s  approach  d  u  e  to  i  i  mi  tati  ons  i  n  machi  ne  preci  si  on .  i  n  the 
originai  covar i  an ce f or m  of  the  esti  mator ,  th e  expon ent  i  s  ai  w  ays  negati  ve.  W  h i  i  e  i  t  can  h ave  a 
iargeabsoiutevai  uewhen  thetest  point  isfar  from  the  i  mpact  poi  nt  under  consideration,  the 
negative  vaiue  means  it  has  minimai  contribution  to  the  sum  and  so  does  not  introduce 
significant  numericai  error.  When  using  the  estimator  form  containing  B  matrices,  the 
exponent  can  bearbitrariiy  big:  either  negativeor  positive.  Evaiuation  of  the  exponent!  ai  can 
magnify  the  numericai  error,  which  does  not  cancei  out  after  math  x  mu  iti  pi  i  cation  even  if  the 
resuit  is  a  smaii  vaiue.  This  makes  direct  caicuiations,  i.e.  caicuiating  exponents  and  then 
muitipiying  their  vaiues,  undesirabie. 

One  possibiefixto  thisprobiem  invoives  re-defining  the  exponents  in  thefoiiowing  way: 

\nB  =  p  +  aq 


wherea  is  an  arbitrariiy  seiected  parameter,  q  is  some  integer  number  and  0<  p<a  isthe 
remainder.  The  product  of  the  exponentiai  terms  can  then  be  caicuiated  by  muitipiying  p 
terms  and  summi  ng  q  terms  as  the  esti  mator  now  takes  the  form: 


PljX‘“l‘J  „Pik+mik 


n 


k=\ 


The  parameter  fl  is  seiected  sothatfindingp  and  isfast.  After  muitipiying  thep  terms,  theq 
terms  are  summed  to  give  the  factor  expfa)^''.  Aii  terms  of  this  form  can  be  obtained  from  a 
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pre-calculated  table.  Theoretically,  the  sum  of  the  q  terms  is  not  bound  by  any  limits. 
However,  iftheextremelowervalueoftheestimator  evaluation  isexpected  to  be  around  10"^ 
it  is  safe  to  remove  all  table  entries  below  10"^°°,  for  example.  On  the  other  hand,  the  sum 
cannot  be  greater  than  say,  100/ a,  otherwise  the  probability  function  will  exceed  Ibymany 
ord ers  of  magn i  tu d e.  I  n  practi  ce,  i f  a  i  s  chosen  to  be  10,  a  tabi  e  of  si  ze 40  i  s  suff i  ci  ent  because  i  t 
covers  around  40  orders  of  magnitudeinthedensityfunction;for  example,  from  lO'^to  10“. 

Note  that  the  first  exponential  term  in  theaboveformula(Bij*^’)  should  not  be  factored  out  of 
thesummation.  This  is  to  ensure  that  thecorrect  cancel  lati  ons  occur  during  thecal  culati  on  of 
each  product. 

5.4  Conclusion 

Bel  ow  is  the  I  ist  of  steps  necessary  for  the  cal  culati  on  of  the  esti  mati  on  functi  on  (without  the 
numerical  optimisation  discussed  above): 

•  Compute  the  matrix  S 

•  Find  the  main  eigenvector  (this  defines  the  rotation  matrix  U) 

•  Find  both  eigenvalues  Si  and  S2  using  U  or  directly  from  step  2  (correct  S2  if  necessary) 

•  Find  the  two  bandwidth  values^  and  hz  using  the  one-dimensional  formula 

•  Compute  matrix  H 2 

•  Find  H2"^and  det(H2)  (this  is  possible  because det(S)i^0  ) 

•  Build  theestimation  function/(x,y)  using  the  bivariate  normal  distribution. 

The  final  density  function  is  automatically  normalised  to  1. 

This  approach  has  been  shown  to  offer  better  kernel  density  estimates  than  the  diagonal 
bandwidth  matrix  for  a  number  of  cases  found  in  the  data.  As  there  have  not  yet  been  any 
cases  observed  that  will  be  I  ess  optimally  predicted  by  the  non-diagonal  bandwidth  matrix 
outlined  in  this  section  (when  compared  to  predict!  ons  using  the  diagonal  bandwidth  matrix), 
theapproach  has  been  adopted  for  all  RSTT  guided  weapon  ground  impactdata.  Our  analysis 
is  based  on  observed  cases  intheavai  I  able  data,  butitisnotexhaustiveand  highlights  that  the 
prediction  of  non-diagonal  bandwidth  matrices  is  a  challenging  and  potentially  fruitful  area  of 
research. 
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6.  Conclusion  and  areas  for  further  research 

I  n  th  i  s  report  w  e  have  i  nvesti  gated  tech  ni  qu  es  for  anal  ysi  ng  si  mu  I  ati  on  d  ata  of  mi  ssi  I  e  i  mpact 
distri  butions. 

While  we  have  successfully  addressed  the  fundamental  issue  of  defining  techniques  of 
potential  value  in  developing  rangesafety  templates,  itisclearfromtheanalysisthatthereare 
many  factors  to  be  considered  in  generating  an  accurate  statist  cal  model. 

In  particular,  there  can  be  no  "blind"  fully  automated  process  for  producing  a  statistical 
esti  mate  that  i  s  appropri  ate  for  any  and  every  dataset  that  may  ari  se.  1 1  i  s  necessary  to  ensu  re 
that  appropriatestatisti  cal  inputs  have  been  used,  and  that  the  results  obtained  are  consistent 
both  with  the  data  and  with  what  might  reasonably  be  expected  in  reality.  Therefore,  any 
procedure  for  generating  a  kernel  density  esti  mate  must  be  reviewed  by  a  panel  of  experts, 
including  both  statisticians  and  weapons  experts. 

Kernel  Density  Estimation  appears  to  provideagood  basis  for  deriving  rangesafety  templates 
or  W  DA  s  from  discrete  simulated  ground  impact  points. 

6.1  Outcomes 

TheR&D  undertaken  by  CDCIN  and  DSTO  has: 

1.  Qualitatively  described  the  features  of  impact  distribution  data  that  may  affect 
subsequent  statistical  modelling. 

2.  Defined  atechnique,specifically,theuseof  kernel  density  estimation, for  providinga 
statistical  model  of  a  specific  missile  impact  data  set  which  esti  mates  the  probability 
d  ensi  ty  f  u  ncti  on  of  the  i  mpact  d  i  stri  buti  on.Thesolution  proposed  here  i  s  pu  rel  y  d  ata 
analytic  and  as  such  does  not  allow  for  the  incorporation  of  any  substantive 
knowledge. 

3.  Defi  ned  a  techni  quefor  combi  ni  ng  kernel  density  esti  mates  correspondi  ng  to  different 
failure  modes  within  a  single  operational  scenario. 

4.  Defined  a  technique  for  incorporating  information  on  missile  Maximum  Energy 
Boundaries  into  the  analysis  so  as  to  refine  the  impact  zone  probability  density 
function. 

5.  Defined  a  technique  for  using  the  probability  density  function  together  with 
population  density  information  to  obtain  estimated  injury  rates  for  a  given  scenario. 

6.  Defined  a  techni  quefor  using  the  probability  density  function  together  with  range 
boundary  information  to  obtain  an  esti  mate  for  a  missile  leaving  a  given  range. 

7.  Defined  a  technique  for  using  the  probability  density  function  to  determine  a 
conservative,  convex  safety  exclusion  zone  with  given  probability  of  the  missile 
leaving  the  zone. 

8.  Defined  an  approximatetechniquefordefiningaconservativeexclusionzonederived 
from  probability  density  functions  of  different  scenarios. 
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9.  Found  that  KDE  resolutions  beyond  16  x  16  and  32x32  do  not  provide  significantly 
more  accurate  information  and  hence  16  x  16  or  32  x  32  resolutions  appear  to  be 
suitable  for  the  development  of  Range  Safety  Templates. 

10.  Found  that  at  I  east  600  observati  ons  (i  mpact  data  poi  nts)  shoul  d  be  used  i  n  generati  ng 
KDEsfor  a  given  scenario. 

11.  Identified  situations  in  which  the  Kernel  Density  Estimation  process  is  not  robust, 
general  I  y  w  hen  ti  ght  cl  usters  of  d  ata  poi  nts  occu  r  w  i  thi  n  the  d  ata  set.  I  n  su  ch  cases  the 
bandwidth  parameters  automatically  generated  by  the  process  tend  to  be  very  small 
and  the  KDE  generated  consequently  "erratic".  This  report  has  suggested  one  method 
of  dynamic  bandwidth  calculation  to  improve  the  PDF  for  clustered  or  non-normal 
ground  impact  distributions. 

12.  Described  a  covariant  form  of  the  Kernel  Density  Estimator  for  two-dimensional  data 
that  robustly  predicts  the  ground  impact  probability  function  for  a  number  of 
avail  able  data  sets. 

13.  Outlined  a  numerical  approach  to  ensure  computationally  accurate  and  efficient 
results  are  obtained  when  using  the  kernel  density  estimatetechniquewith  real  impact 
data. 

6.2  Further  research 

There  are  a  number  of  areas  in  which  further  research  should  be  carried  out  in  order  to  make 
the  analysis  more  robust  to  the  range  of  potential  operational  scenarios.  The  major  areas 
include: 

1.  The  use  of  kernel  density  estimation  requires  selection  of  certain  so-called 
"  band  w  i  dth"  parameters.  These  parameters  are  the  key  to  control  I  i  ng  the  natureof  the 
overall  estimate  created.  In  the  analysis  provided,  well-known  rules  of  thumb  have 
been  emp I  oyed ,  but  h ave  been  sh ow  n  to  h ave  some  potenti  al  I  i  mi  tati  ons  w  h  i  ch  sh ou  I  d 
be  addressed.  Investigation  of  the  implementation  of  a  more  general  dynamic  or 
adaptive  bandwidth  method  may  yield  positive  results. 

2.  Thenumber  of  simulated  data  points  required  to  generate"sufficiently  good"  kernel 
density  estimates  should  be  investigated  further  in  light  of  changes  to  bandwidth 
selection  algorithms.  Additionally,  a  greater  range  of  impact  distribution  scenarios 
should  be  investigated  using  a  greater  number  of  simulated  points. 

3.  Without  significantly  greater  examination  of  typical  data  related  to  missile  impact 
d  i  stri  buti  ons  w  e  are  u  nabi  e  at  thi  s  ti  me  to  veri  fy  that  creati  ng  a  convex  excl  usionzone 
from  i  ndi  vi  dual  scenari  o  excl  usi  on  zones  wi  1 1  provi  de  a  conservative  excl  usi  on  zone 
for  a  scenari  o  w  hose  input  parameters  I  i  e  betw  een  the  i  n  put  parameters  of  the  know  n 
scenarios.  Such  an  investigation  should  be  carried  out  in  order  to  determine  the  full 
limitations  of  thedefined  procedure. 

4.  Further  development  of  the  Exclusion  Zone  difference  to  measure  the  difference 
between  the  convex  exclusion  zones,  rather  than  the  raw  exclusion  zones,  for  the 
purposes  of  comparing  variations  in  KDE  algorithms  and  data  sizes. 
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5.  The  application  of  anon-diagonal  bandwidth  matrix  in  the  kernel  density  estimation 
has  been  shown  to  be  quite  useful  in  anumber  of  observed  ground  impact  data  sets. 
Fu  rther  expl  orati  on  of  methods  for  p  red  i  cti  ng  the  band  w  i  dth  matri  x  parameters  mi  ght 
identify  an  approach  that  ensures  the  optimal  kernel  density  estimate  is  obtained  in 
most  cases. 
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Appendix  A;  Datafiles  provided  by  DSTO  for  the 
analysis  described  in  this  report 


Filename 

Size 

Description 

fau  1  t5et_al  l_actuators_zeroed_ggml-10000_l.csv 

1.698 

MB 

10,000  observati  ons  correspond  i  ng  to  f ai  1  u  re  mod  e  1 
(zero  deflection)  for  the  spatial  scenario  chosen  to 
illustrate  the  techniques  proposed  in  this  report. 

fau  It5et_al  l_ad;uators_zeroed_ggml-10000_2.csv 

1.658 

MB 

10,000  observations  corresponding  to  failure  mode  1 
(zero  deflection)  for  a  particular  spatial  scenario. 

fau  It5et_al  l_ad;uators_zeroed_ggml-10000_3.csv 

1.704 

MB 

10,000  observations  corresponding  to  failure  mode  1 
(zero  deflection)  for  a  particular  spatial  scenario. 

fau  It5et_al  l_actuators_zeroed_ggml-10000_4.csv 

1.679 

MB 

10,000  observati  ons  correspond  i  ng  to  f  al  1  u  re  mod  e  1 
(zero  deflection)  for  a  particular  spatial  scenario. 

fau  It5et_al  l_actuators_zeroed_ggml-10000_5.csv 

1.709 

MB 

10,000  observations  corresponding  to  failure  mode  1 
(zero  deflection)  for  a  particular  spatial  scenario. 

fau  It5et_al  l_actuators_zeroed_ggml-20000_6.csv 

3.396 

MB 

Exactl  y  the  same  as 

faultset_all_actuators_zeroed_ggml-10000_l.csv  with  a 
further  10,000  observations  added  at  the  beginning 
(total  20,000  observations). 

fau  It5et_si  ngl  e_ad;uator_f  reeze_ggml-20000_3.csv 

3.507 

MB 

20,000  observati  ons  correspond  i  ng  to  a  d  i  fferent  f  ai  1  u  re 
mode  (single  actuator  freeze)  for  a  particular  spatial 
scenario. 

nofault_ggm_000_l-20000.csv 

3.427 

MB 

20,000  observations  corresponding  to  failure  mode  0 
(no  fal  1  u  re)  for  the  same  spatl  al  scenarl  o  as 
faultset_single_actuator_freeze_ggml-20000_3.csv  and 
faultset_all_actuators_zeroed_ggml-10000_l.csv. 

fault5et_all_ad;uators_zeroed_ggm_007.1-50000.c5v 

10.038 

MB 

10,000  for  each  of  five  different  spatial  scenarios  which 
are  all  identical  except  for  the  initial  x-di stance  between 
the  launcher  and  target. 

fault5et_all_actuators_zeroed_ggm_008.l-5000.csv 

0.99 

MB 

A  further  1000  observations  at  each  of  thefivescenarios 
i  n  fau  1  tset_al  l_actu  ators_zeroed_ggm_007.  l-50000.csv 
correspond  i  ng  to  a  d  i  fferent  i  nterval  of  the  fai  1  u  re  ti  me. 

faultset_all_actuators_zeroed_ggm_001.1- 

480000.CSV 

94.664 

MB 

10,000  observations  at  each  of  48  different  spatial 
scenarl  os.  These  scenarl  os  consi  st  of  al  1  possi  hi  e 
combinations  of  six  different  initial  positions  and  eight 
different  initial  directions  of  the  target. 

faultset_all_actuators_zeroed_ggm_009.l-50000.csv 

9.522 

MB 

50,000  observati  ons  at  a  si  ngl  e  spatl  al  scenarl  o.  Thi  s 
scenarl  o  i  s  al  so  sy  mmetrl  c  to  the  scenari  o  i  n 
faultset_all_actuators_zeroed_ggml-10000_l.csv. 

fau  1  tset_al  l_actuators_zeroed_ggm_000_l-  lOOOO.csv 

1.66 

MB 

Exactl  y  the  same  as 

faultset_all_actuators_zeroed_ggml-10000_l.csv. 
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Appendix  B;  Typical  scatter  plots  and  kernel  density 
estimates  for  samples  of  various  sizes 
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Figure  49:  Scatterplot  for  the  dataset  of 50,000  observations  from  the  scenario  used  to  investigate  the 
number  of  points  required 


LoglO  of  Kernel  Density  Estimate 


H 

<-2 

■ 

<-3 

■ 

<-4 

(D 

<-5 

□ 

<-6 

□ 

<-7 

□ 

<-8 

■ 

<-9 

□ 

<-10 

□ 

<-11 

□ 

<-12 

□ 

<-13 

ED 

<-14 

■ 

<-15 

■ 

<-16 

■ 

<-17 

■ 

<-18 

■ 

<-19 

□ 

<-20 

-10  0  10  20  30 

Distance  from  missile  launch  (km) 


Figure  50:  Kernel  density  estimate  for  the  dataset  of  50,000  observations  from  the  scenario  used  to 
investigate  the  number  of  points  required 
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Conservative  Convex  Exclusion  Zone 
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Figure  51:  10-^  exclusion  zone  for  the  dataset  of  50,000  observations  from  the  scenario  used  to 
investigate  the  number  of  points  required 


Impact  Point  Distribution 


0  5000  10000  15000  20000 


Distance  from  missile  launch  (m) 

Figure  52:  Typical  scatterplot  for  a  sample  of  size  5000  from  the  scenario  used  to  investigate  the 
number  of  points  required 
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Figure  53:  Typical  scatterplot  for  a  sample  of  size  10,000  from  the  scenario  used  to  investigate  the 
number  of  points  required 
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Figure  54:  Typical  scatterplot  for  a  sample  of  size  15,000  from  the  scenario  used  to  investigate  the 
number  of  points  required 
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Figure  55:  Typical  scatterplot  for  a  sample  of  size  20,000  from  the  scenario  used  to  investigate  the 
number  of  points  required 
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Figure  56:  Typical  scatterplot  for  a  sample  of  size  25,000  from  the  scenario  used  to  investigate  the 
number  of  points  required 
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Figure  57:  Typical  scatterplot  for  a  sample  of  size  30,000  from  the  scenario  used  to  investigate  the 
number  of  points  required 
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Figure  58:  Typical  kernel  density  estimate  for  a  sample  of  size  5000  from  the  scenario  used  to 
investigate  the  number  of  points  required 
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LoglO  of  Kernel  Density  Estimate 


Figure  59:  Typical  kernel  density  estimate  for  a  sample  of  size  10,000  from  the  scenario  used  to 
investigate  the  number  of  points  required 
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Figure  60:  Typical  kernel  density  estimate  for  a  sample  of  size  15,000  from  the  scenario  used  to 
investigate  the  number  of  points  required 
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LoglO  of  Kernel  Density  Estimate 


■ 

<-2 

■ 

<-3 

■ 

<-4 

□ 

<-5 

□ 

<-6 

□ 

<-7 

□ 

<-8 

B 

<-9 

□ 

<-10 

□ 

<-11 

□ 

<-12 

□ 

<-13 

□ 

<-14 

B 

<-15 

B 

<-16 

B 

<-17 

B 

<-18 

B 

<-19 

□ 

<-20 

0  10  20 
Distance  from  missile  launch  (km) 


Figure  61:  Typical  kernel  density  estimate  for  a  sample  of  size  20,000  from  the  scenario  used  to 
investigate  the  number  of  points  required 
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Figure  62:  Typical  kernel  density  estimate  for  a  sample  of  size  25,000  from  the  scenario  used  to 
investigate  the  number  of  points  required 
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LoglO  of  Kernel  Density  Estimate 
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Figure  63:  Typical  kernel  density  estimate  for  a  sample  of  size  30,000  from  the  scenario  used  to 
investigate  the  number  of  points  required 


Conservative  Convex  Exclusion  Zone 


Figure  64:  10-^  exclusion  zone  based  on  typical  kernel  density  estimate  for  a  sample  of  size  5000 from 
the  scenario  used  to  investigate  the  number  of  points  required 
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Conservative  Convex  Exclusion  Zone 
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Figure  65:  10-^  exclusion  zone  based  on  typical  kernel  density  estimate  for  a  sample  of  size  10,000 from 
the  scenario  used  to  investigate  the  number  of  points  required 
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Figure  66:  10-^  exclusion  zone  based  on  typical  kernel  density  estimate  for  a  sample  of  size  15,000 from 
the  scenario  used  to  investigate  the  number  of  points  required 
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Conservative  Convex  Exclusion  Zone 
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Figure  67:  10-^  exclusion  zone  based  on  typical  kernel  density  estimate  for  a  sample  of  size  20,000 from 
the  scenario  used  to  investigate  the  number  of  points  required 
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Figure  68:  10-^  exclusion  zone  based  on  typical  kernel  density  estimate  for  a  sample  of  size  25,000 from 
the  scenario  used  to  investigate  the  number  of  points  required 
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Conservative  Convex  Exclusion  Zone 
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Figure  69:  10-^  exclusion  zone  based  on  typical  kernel  density  estimate  for  a  sample  of  size  30,000 from 
the  scenario  used  to  investigate  the  number  of  points  required 
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